Metro node: How to collect logs from the metro node
Summary: This article outlines the steps on how to collect logs from the metro node, and also covers what logs/data may be needed for a performance issue.
Instructions
Steps on how to accomplish the following tasks:
- What logs are required to debug metro node problems?
- How do I capture collect-diagnostics on a metro node cluster?
- How to validate the existing collect-diagnostics packages on the management servers.
- How to cancel and clean up an ongoing collect-diagnostics in a metro node?
A. What logs are required to debug metro node problems?
-
The command that is required to collect logs from the metro node is called "collect-diagnostics" and can be run from any node(*1) in the metro node setup. Running this command on one director of a metro node cluster should have all the data from all directors, from all nodes of a cluster. DO NOT run this command on more than one node at a time.
*1 NOTE: Run the 'collect-diagnostics' command from only one director, on only one cluster if a Metro configuration, and wait for its full completion before gathering the CDs from another director or from the peer cluster, if needed. -
The 'collect-diagnostics' command produces a compressed tar.gz log file containing configuration and log files. The collect-diagnostics file is placed in the /diag/collect-diagnostics-out/ directory on the node that it was run from. Once the command finishes, use WINSCP, or an equivalent SCP utility, to copy the file off the node and then it can be provided to support for analysis. There is more info on the use of this command in section B below.
Notes:- If the 'collect-diagnostics' command is run with no options, two files are generated, a base file and an extended file. This can take quite a long time on scaled systems.
- Metro node support generally requires only the Base file, however, in some circumstances, performance issues they may ask for the extended file as well.
- Standard options that may be asked to be used when running collect-diagnostics are,
- "
--noextended": This option omits the collection of extended diagnostics. - "
--last-logs": This option captures logs back x number of hours or days.
- "
- For more details on the command, you can type "collect-diagnostics -h"
These are samples of what these two filenames look like, the date and time, shown as YYYY-MM-DD-HH.MM.SS, is from the date and time these were collected:
- Base file:
<Serial number>-c1-diag-YYYY-MM-DD-HH.MM.SS.tar.gz - Extended file:
<Serial number>-c1-diag-ext-YYYY-MM-DD-HH.MM.SS.tar.gz
-
Performance issues are complex and require a lot of specific information to be gathered. As a result, we have a performance questionnaire which customers are requested to fill out to expedite this process. The questionnaire can be found attached to this knowledge base article in the attachment section at the end.
-
In some types of performance issues, it is helpful to capture an additional log called "fe_perf_stats." The logs are continuously generated, but not captured by collect-diagnostics. To capture this log, cd (change directory) to /var/log/VPlex/cli on a node from each cluster and run the command "tar cvzf fe-perf-stats.tar.gz fe_perf_stats*" to compress the data of the files into a tar file. Connect to the node with WINSCP, or an equivalent SCP utility, and browse to /var/log/VPlex/cli. Copy the "fe-perf-stats.tar.gz" file to your system. Upload the tar file along with one or more collect-diagnostics files, if requested by support, to the SR or an ftp link support provides to you in the SR and an email.
-
In addition to collect diagnostics, it may be helpful to capture the following information;
- Open logging for a putty session,
- Then run the commands below,
- Then collect the Putty log and download it to your system,
- Then attach the PuTTY log, the collect-diagnostics, and any other data requested to the SR.
The following commands are to be run from the VPlexcli prompt.
cluster statusll clusters/**/storage-views/* --fullll ~portsshow-use-hierarchy /clusters/**/virtual-volumes/*ll ~system-volumesls -t /clusters/*/directors/*::serial-number(this command lists out all the DSTs for each node)ls -t /clusters/**/director-*/::hostname(the hostnames displayed is the IP Addresses, this is expected)
B. How do I capture collect-diagnostics on a metro node cluster?
To capture this data, run a collect-diagnostics command with the following flags "--noextended" and "--last-logs 30d."
-
Establish an SSH session at a director node Linux prompt, example, service@director-1-1-a, then log in to the vplexcli.
Sample output:
login as: service Keyboard-interactive authentication prompts from server: | Password: End of keyboard-interactive prompts from server Last login: <date and timestamp data> from x.x.x.x service@director-1-1-a:~> service@director-1-1-a:~> vplexcli Trying ::1... Connected to localhost. Escape character is '^]'. VPlexcli:/>
-
To start the collect-diagnostics, from the vplexcli prompt run the "collect-diagnostics" command with the directed options as shown in the example below.
Example Output:
VPlexcli:/> collect-diagnostics --noextended --last-logs 30d ('WARNING:The collect-diagnostics command was issued with option --noextended. ',) The following file(s) will NOT be collected: core files fast trace dump files slow trace dump files udcom trace dump files udcom legacy trace files user-defined performance sink files the management console's heap ('WARNING:Only the logs that are generated in the last 30 days are collected.') 2024-02-09 19:55:12 UTC: ****Initializing collect-diagnostics... 2024-02-09 19:55:13 UTC: No cluster-witness server found. 2024-02-09 19:55:13 UTC: Free space = 88G 2024-02-09 19:55:13 UTC: Total space needed = 1907M ================================================================================ Starting collect-diagnostics, this operation might take a while... ================================================================================ Executing cluster collection ..
C. How to validate the existing collect-diagnostics packages on the director/node.
-
When the collect-diagnostics command finishes and returns to the vplexcli prompt, connect to the director you ran the command from using winscp [or equivalent SCP utility] and browse to the folder /diag/collect-diagnostics-out/
-
Identify one or more log files with the correct timestamp and download them to your local workstation.
D. How to cancel an ongoing collect-diagnostics
-
If you are still on the PuTTY session where you started the collect-diagnostics you should be seeing the collect-diagnostics output streaming, showing it is still running.
Sample Output:
VPlexcli:/> collect-diagnostics --noextended --last-logs 30d ('WARNING:The collect-diagnostics command was issued with option --noextended. ',) The following file(s) will NOT be collected: core files fast trace dump files slow trace dump files udcom trace dump files udcom legacy trace files user-defined performance sink files the management console's heap ('WARNING:Only the logs that are generated in the last 30 days are collected.') 2022-02-09 19:55:12 UTC: ****Initializing collect-diagnostics... 2022-02-09 19:55:13 UTC: No cluster-witness server found. 2022-02-09 19:55:13 UTC: Free space = 88G 2022-02-09 19:55:13 UTC: Total space needed = 1907M ================================================================================ Starting collect-diagnostics, this operation might take a while... ================================================================================ Executing cluster collection .. -
Open a duplicate PuTTY session and login to the director where you started the collect-diagnostics, using the service account.
Sample Output:
login as: service Using keyboard-interactive authentication. Password: Last login: <date and time stamp data> from x.x.x.x service@director-1-1-b:~>
-
Once on the director, restart the management console using following command to cancel the collect-diagnostics that is running.
Sample Output:
service@director-1-1-b:~> sudo systemctl restart VPlexManagementConsole.service
-
Looking back at the first PuTTY session that has the collect-diagnostics running in it when you restarted the management console you should see the collect-diagnostics report the following as the last noted output,
"Connection closed by foreign host."
Sample output (check the last line of the output):
VPlexcli:/> collect-diagnostics --noextended --last-logs 30d ('WARNING:The collect-diagnostics command was issued with option --noextended. ',) The following file(s) will NOT be collected: core files fast trace dump files slow trace dump files udcom trace dump files udcom legacy trace files user-defined performance sink files the management console's heap ('WARNING:Only the logs that are generated in the last 30 days are collected.') 2022-02-09 20:02:03 UTC: ****Initializing collect-diagnostics... 2022-02-09 20:02:04 UTC: No cluster-witness server found. 2022-02-09 20:02:04 UTC: Free space = 88G 2022-02-09 20:02:04 UTC: Total space needed = 1907M ================================================================================ Starting collect-diagnostics, this operation might take a while... ================================================================================ Executing cluster collection .. ERROR Executing SMS log collection .. Connection closed by foreign host. <<< -
Once the collect-diagnostics are seen stopped, step 3 above, go back to the second PuTTY session and 'cd' to the /diag directory, then run 'll ' and you should see some extra directories,
collect-diagnostics-tmpcollect-diagnostics-jobscollect-diagnostics-tmp-ext*
*if extended files were not omitted
Sample output:
service@director-1-1-b:/diag> ll total 32 drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:03 collect-diagnostics-tmp-ext drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:03 collect-diagnostics-jobs drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:04 collect-diagnostics-out drwxr-xr-x 3 service groupSvc 4096 Feb 9 20:02 collect-diagnostics-tmp drwx------ 2 root root 16384 Jan 27 16:54 lost+found drwx--x--x 3 service groupSvc 4096 Dec 17 03:08 share service@director-1-1-b:/diag>
-
If you look inside each of these directories, you see files with the date and time you had started the now cancelled collect-diagnostics. These files take up space in the /diag partition and should be removed.
-
To remove/delete the files from the /diag directory type "rm -r collect-diagnostics-jobs" and "rm -r collect-diagnostics-tmp," then enter 'll' again to ensure that the directories have been deleted or removed.
Sample output:
service@director-1-1-b:/diag> rm -r collect-diagnostics-jobs service@director-1-1-b:/diag> rm -r collect-diagnostics-tmp service@director-1-1-b:/diag> ll total 24 drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:04 collect-diagnostics-out drwx------ 2 root root 16384 Jan 27 16:54 lost+found drwx--x--x 3 service groupSvc 4096 Dec 17 03:08 share service@director-1-1-b:/diag>
-
If a 'collect-diagnostics-tmp-ext' directory does exist, remove it run "rm -r collect-diagnostics-tmp-ext"
Note: The extended file is typically used to investigate node crashes. If there is an ongoing investigation into a node crash and support has not captured all necessary logs, check with support before cleaning up the collect-diagnostics-tmp-ext directory as doing so may delete necessary core files.