VPLEX: Management server experience high RAM memory usage and internal disk space usage

Summary: management server running without an external AMQP event consumer may experience high RAM memory usage and internal disk space usage.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms



  • A management server running without an external AMQP event consumer may experience high RAM memory usage and internal disk space usage. 
  • A system experiencing high memory consumption will experience higher than normal latency when executing CLI or GUI commands.  
  • A system that runs out of internal disk space at the root partition will not be able to write to that partition (while VPLEX will continue to write logs into other partitions, several linux services use the root partition and will not be able to log further events).

Symptom 1:
Large amount of RAM memory being used by rabbitmq.
 
service@ManagementServer:~> top
 
top - 13:17:26 up 103 days, 13 min, 20 users,  load average: 0.28, 0.34, 0.36
Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.3%us,  0.9%sy,  0.0%ni, 85.0%id,  1.5%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   3920396k total,  3448376k used,   472020k free,    14752k buffers
Swap:  8388604k total,   413608k used,  7974996k free,  1781800k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+  COMMAND
22674 rabbitmq  20   0  3.4g 3.7g 2040 S      2  87.9 225:09.39 beam.smp
16302 service   20   0 2975m 1.1g 9232 S      2 2.4 561:18.54 java
 
Symptom 2:
Call homes warning of high or critical disk space usage.
When the root partition on the management server reaches 90% of available space you will see the following
call home.

 
<ID>0x8a4a31fb</ID>
<name>SMS_PARTITION_HIGH_CAPACITY</name>
 <serverity> ERROR </severity>
<customerRCA>A partition on your Management Server has reached a high capacity.</customerRCA>

 
When the root partition becomes full you will see the following call home.
 
<ID>0x8a4a61fa</ID>
<name>SMS_PARTITION_CRITICAL_CAPACITY</name>
 <serverity> ERROR </severity>
<customerRCA>A partition on your Management Server has exceeded a critical capacity threshold.</customerRCA>

 
Symptom 3:
A large amount of disk space is being used by rabbitmq.

 
service@ManagementServer:/var/lib/rabbitmq/mnesia/rabbit@localhost # du -shx *
4.0K    cluster_nodes.config
4.0K    DECISION_TAB.LOG
4.0K    LATEST.LOG
32K     msg_store_persistent
14G     msg_store_transient <<<<

4.0K    nodes_running_at_shutdown
408M    queues
4.0K    rabbit_durable_exchange.DCD
4.0K    rabbit_durable_queue.DCD
4.0K    rabbit_durable_queue.DCL
4.0K    rabbit_durable_route.DCD
4.0K    rabbit_runtime_parameters.DCD
8.0K    rabbit_runtime_parameters.DCL
4.0K    rabbit_serial
4.0K    rabbit_user.DCD
4.0K    rabbit_user_permission.DCD
4.0K    rabbit_vhost.DCD

service@ManagementServer:/var/lib/rabbitmq/mnesia/rabbit@localhost # df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        20G   19G  692K 100% /
<<<<
udev            1.9G  196K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       504M   60M  420M  13% /boot
/dev/sda7        16G  4.1G   11G  27% /var/log
/dev/sda8        44G  5.0G   37G  13% /diag
/dev/sda9       9.9G  151M  9.2G   2% /data


service@ManagementServer:/var/lib/rabbitmq/mnesia/rabbit@localhost # ls -lah msg_store_transient/ | head
total 14G
drwxr-x--- 1 rabbitmq rabbitmq  12K Nov 13 11:14 .
drwxr-x--- 1 rabbitmq rabbitmq  734 Nov 13 15:03 ..
-rw-r----- 1 rabbitmq rabbitmq  15M Nov  6 05:51 0.rdq
-rw-r----- 1 rabbitmq rabbitmq  17M Nov 13 05:19 1000.rdq
-rw-r----- 1 rabbitmq rabbitmq  17M Nov 13 05:21 1001.rdq
-rw-r----- 1 rabbitmq rabbitmq  17M Nov 13 05:22 1002.rdq
-rw-r----- 1 rabbitmq rabbitmq  17M Nov 13 05:23 1003.rdq
-rw-r----- 1 rabbitmq rabbitmq  17M Nov 13 05:25 1004.rdq
-rw-r----- 1 rabbitmq rabbitmq  17M Nov 13 05:30 1005.rdq


Symptom 4:
A large number of messages on the external message queue.

 
service@sms-bali-2:~> sudo rabbitmqctl list_queues
Listing queues ...
aliveness-test  0
queue.vplex.external    1749909 <<<<<
queue.vplex.ndu.events  0
sms_internal    0
...done.

 

Cause

  • RabbitMQ will attempt to retain all events until the events are consumed or the memory consumption threshold is hit. 
  • For the VPLEX, currently, there is no default consumer for queue.vplex.external queue. 
  • If left unchecked, the queue size can grow very large. 
  • Once memory consumption hits 20%, RabbitMQ will write the queue out to disk which in turn uses root disk space. 

Resolution

Workaround 1:
From the management server restart the rabbitmq server using the following command

sudo service rabbitmq-server restart

Sample output:

service@ManagementServer:~>
rabbitmq-server.
Restarting rabbitmq-server: SUCCESS
sudo service rabbitmq-server restartservice@ManagementServer:~>  

 

Workaround 2:
From the management server restart the management server using the following command,

sudo shutdown  r now

Sample output:
service@ManagementServer:~> sudo shutdown -r now

Broadcast message from root (pts/0) (Mon Mar 5 19:33:18 2018):

The system is going down for reboot NOW!


Note:
Then a PuTTY Fatal Error message will pop-up stating the "Server unexpectedly closed network connection"

Permanent Fix:
This issue was addressed in GeoSynchrony 5.5 and later.

Affected Products

VPLEX Series

Products

VPLEX for All Flash, VPLEX GeoSynchrony, VPLEX Series, VPLEX VS1, VPLEX VS2
Article Properties
Article Number: 000170841
Article Type: Solution
Last Modified: 20 Nov 2020
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.