ViPR SRM : Le module d’alerte se bloque
Summary: Le module d’alerte se bloque dans SRM
Symptoms
Vous avez constaté qu’il y avait plus de 900 connexions au port 2013 (certaines dynamiques) qui étaient dans des états différents :WARNING [2017-04-10 03:49:42 EDT] RawValueDecoder::decode(): Invalid raw value rejectedINFO [2017-04-10 03:49:42 EDT] ChannelNegotiationProtocol::transform(): [id: 0xbff120bb, //xx.xxx.xxx.xxx:46555 => //xx.xxx.xxx.xxx:2010] Initializing channel with no capabilitiesINFO [2017-04-10 03:49:49 EDT] SocketSource::disconnect(): Dropping connection to /xx.xxx.xxx.xxx.INFO [2017-04-10 03:49:49 EDT] SocketSource::connect(): Accepted incoming connection from /xx.xxx.xxx.xxx.SRM version 4.0.1 vAppIssues found during webex:WARNING [2017-04-10 03:49:42 EDT] RawValueDecoder::decode(): Invalid raw value rejectedINFO [2017-04-10 03:49:42 EDT] ChannelNegotiationProtocol::transform(): [id: 0xbff120bb, //xx.xxx.xxx.xxx:46555 => //xx.xxx.xxx.xxx:2010] Initializing channel with no capabilitiesINFO [2017-04-10 03:49:49 EDT] SocketSource::disconnect(): Dropping connection to /xx.xxx.xxx.xxx.INFO [2017-04-10 03:49:49 EDT] SocketSource::connect(): Accepted incoming connection from /xx.xxx.xxx.xxx.WARNING [2017-04-10 03:49:49 EDT] SocketSource$DataReaderWorker::run(): An incoming event could not be processed: com.watch4net.events.common.serialization.SerializationException: Malformed occurrence keys in event: HEAD / HTTP/1.0WARNING [2017-04-10 03:49:49 EDT] SocketSource$DataReaderWorker::run(): An incoming event could not be processed: com.watch4net.events.common.serialization.SerializationException: Malformed definition keys in event: INFO [2017-04-10 03:49:50 EDT] BasicMessagesLoggingHandler::channelInactive(): [id: 0xbff120bb, /xx.xxx.xxx.xxx:46555 :> //xx.xxx.xxx.xxx:2010] Communication channel is now inactive/closedINFO [2017-04-10 03:49:50 EDT] BasicMessagesLoggingHandler::channelActive(): [id: 0xf28e525d, //xx.xxx.xxx.xxx:47531 => //xx.xxx.xxx.xxx:2010] Communication channel is now activeSEVERE [2017-04-10 03:49:50 EDT] ApplicationDataForwarder::unhandledExceptionCaught(): An unhandled error occured on channel [id: 0xf28e525d, //xx.xxx.xxx.xxx:47531 => //xx.xxx.xxx.xxx:2010]com.emc.watch4net.socket.communicator.handler.rawvalue.RawValueDecoder$InvalidRawValueExceptiAn incoming event could not be processed: com.watch4net.events.common.serialization.SerializationException: Malformed definition keys in event /xx.xxx.xxx.xxx pbe = Name:hostname.net Address: /xx.xxx.xxx.xxxWARNING [2017-04-12 05:14:26 EDT] SocketSource$DataReaderWorker::run(): An incoming event could not be processed: com.watch4net.events.common.serialization.SerializationException: Malformed occurence keys in event: [1]ClientHelloWARNING [2017-04-12 05:14:26 EDT] SocketSource$DataReaderWorker::run(): An incoming event could not be processed: com.watch4net.events.common.serialization.SerializationException: Malformed occurence keys in event: EndMessageINFO [2017-04-12 05:14:34 EDT] SocketSource::disconnect(): Dropping connection to 10.xxx.xxx.xxx.INFO [2017-04-12 05:14:34 EDT] SocketSource::connect(): Accepted incoming connection from 10.xxx.xxx.xxx.WARNING [2017-04-12 05:15:46 EDT] SocketSource$DataReaderWorker::run(): An incoming event could not be processed: com.watch4net.events.common.serialization.SerializationException: Malformed occurence keys in event: ________________________________________
Cause
bunit-group.csv n’est pas valide
fe = xx.xxx.xxx.xxx Name: jxqpstgsrmfe01.onefiserv.net Address: xx.xxx.xxx.xxxOPEN CONNECTIONS:tcp 96848 0 xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:35817 ESTABLISHED off (0.00/0/0)tcp 96848 0 1xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:55989 ESTABLISHED off (0.00/0/0)tcp 96848 0 xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:60556 ESTABLISHED off (0.00/0/0)tcp 30713 0 xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:50174 CLOSE_WAIT off (0.00/0/0)tcp 43047 0 xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:38699 ESTABLISHED off (0.00/0/0)tcp 22574 0 xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:58961 CLOSE_WAIT off (0.00/0/0)tcp 98467 0 xx.xxx.xxx.xxx:2013 xx.xxx.xxx.xxx:43472 ESTABLISHED off (0.00/0/0)tcp 0 50945 1xx.xxx.xxx.xxx:50364 xx.xxx.xxx.xxx:2013 FIN_WAIT1 unkn-4 (2.26/0/0)Java Heap:apg 32467 1 5 Apr04 ? 12:02:19 /opt/APG/Java/Sun-JRE/8.0.102/bin/java -Xms256m -Xmx2048m -javaagent:/opt/APG/bin/.runtime/service/1.10u4/apg-bootstrap-agent.jar -Djava.rmi.server.hostname=jxqpstgsrmbe01.onefiserv.net -Djava.util.logging.config.file=conf/alerting.logging.properties -Dcom.watch4net.utils.jmx.agent.config.file=conf/w4n-agent.properties -Dcom.watch4net.utils.jmx.agent.host=jxqpstgsrmbe01.onefiserv.net -javaagent:lib/w4n-jmx-agent.jar -cp /opt/APG/bin/.runtime/service/1.10u4/apg-service-bootstrap.jar:lib/* com.watch4net.apg.module.plugin.service.Bootstrap com.watch4net.alerting.engine.AlertingEngine main start
Resolution
La solution de contournement fournie ici ressemble à ceci :
Ajoutez la solution de contournement à apg.properties fDe plus, il n’y a pas d /opt/APG/bin du Primary Backend, du Additional Backend et de tous les collecteurs si possible pour savoir à quel collecteur proviennent les données framelegnth est dépassé, bien que le pointeur soit sur localhost, mais il est bon de l’ajouter sur chaque hôte collecteur. (Remarque : Effectuez d’abord une sauvegarde du fichier apg.properties).
Ajoutez la ligne suivante à la fin du fichier : /opt/APG/bin/apg.properties de chaque hôte SRM (BE principal, BE supplémentaire et collecteurs)restricted.reader.line.size=50000
Redémarrez ensuite les services back-end d’alerte.