Data Domain: How to List Files on the Data Domain File System, sfs-dump

摘要: This article explains how to use the Data Domain CLI to dump the file and directory listings for individual MTrees or for the file system (FS) as a whole.

本文章適用於 本文章不適用於 本文無關於任何特定產品。 本文未識別所有產品版本。

症狀

The Data Domain is a passive backend storage device. It only stores what the backup applications tell it to, and only deletes data when the backup application tells it to delete files. The Data Domain never ever creates or deletes any files on its own.

No matter the ingested protocol, the Data Domain FS contains only files (within directories) organized in MTrees. All the files (both for files in Active and in any cloud units) share the same root and namespace; a single namespace exists, which means file and directory listings include the files in the Active and cloud units without any distinction.

Getting a detailed file dump can be useful for reasons such as:

  • Comparing against a list of files managed by a backup application, to check for any files orphaned from that backup application.
  • Listing all files over a certain age threshold, to determine if the backup application is enforcing the backup retention fine

原因

The CLI tool for creating file lists is sfs_dump.

解析度

Use SSH Session Logging to collect the list output. The exact command to run from the DD CLI to collect listings of files depends on the DDOS version in use.

 

Quick Links

To get a detailed listing of files in a Data Domain from the CLI, and eventually the process for gaining intelligence on files stored in the Data Domain, depend on the DDOS release being run. Once the text output file with the file details is collected, processing the output into a more usable form is always the same, as the output format for all DDOS releases is the same (or can be converted to a common format using a script).

 

SSH Session Logging


Common to all DDOS releases is the requirement to log in to the Data Domain as an admin user, and do so using an SSH client which supports logging of the console output to a text file on the client side (PuTTY works well for this). Configure the SSH client so that it logs output to text file on the client (ensuring that there are no limits in the number of lines logged or the individual line length). The SSH session log file (and hence DD file listings) is not written on the Data Domain, but on the (typically) desktop client the SSH connection is initiated from.

When collecting the SSH session output ensure to give the file a meaningful name (such as adding the Data Domain hostname and the MTree name to be dumped), and ensure that there is sufficient space for the log file, which can be approximately 200 MB for DDs with 1 million files.

 

DDOS releases prior to 7.13.0.0, 7.10.1.15, 7.7.5.25, and 6.2.1.110:

These releases still support the se sfs_dump command (and the -c option to output file details straight as CSV without any further processing), although going to SE mode is discouraged for security reasons, and instead, upgrade to the latest LTS DDOS release available.

Log into the Data Domain as an admin user, with an SSH client configured to write output to disk, change to SE privilege mode, and then run the following command for each MTree requiring the file details for:

#### To produce the output directly as CSV
# se sfs_dump -c <mtree-path>

#### To produce the output as default format
# se sfs_dump <mtree-path>

Command in these versions can also produce the listing for all the files in the FS at once (instead of one MTree at a time). To dump the file information details for all files in the FS, leave the MTree path away:

# se sfs_dump -c

On completion, ensure to stop logging of the SSH client to disk, and set the files aside for further processing.

Format for the command output is a single line for each file. If not using -c, the format is identical to that shown above for later releases. In cases where -c was used (for CSV output), the format is like this (including a header shown as the first line in output):

name    mtime   fileid  size    seg_bytes       seg_count       redun_seg_count pre_lc_size     post_lc_size
/data/col1/backup/file       1656995350000000000     78      33554431        33668007        4016    0       33668007        34208564


DDOS releases 7.13.0.0, 7.10.1.15, 7.7.5.25, and 6.2.1.110 or later:

The MTree to be listed goes after the "MTree" keyword:

# filesys sfs-dump mtree <mtree-path>

filesys sfs-dump command, compared to se sfs_dump:

  • The new command does not support the -c option to dump file information in CSV (column) format
  • The new command can only run a MTree at a time, there is no support for running against the whole FS in a single invocation

If needing to dump all the MTrees' file details to file, iterate over the whole list of MTrees in the Data Domain FS, and run the command once for each of them. A user can get a list of MTrees in the system by running:

# mtree list

Log into the Data Domain as an admin user, with an SSH client configured to write output to disk, and then run the following command for each MTree requiring the file details for:

# filesys sfs-dump mtree <mtree-path>

If the file details for more than one MTree are needed, choose at your option to do all dumps to the same output file, or switch in the SSH client configuration to a different file before running the command for each one of the MTrees.

Alternatively, you may do the collection of the output in an unattended way, by running the command through SSH like this:

#### Releases 7.13.0.0, 7.10.1.15, 7.7.5.25 and 6.2.1.110 onwards only
# ssh sysadmin@DD-HOSTNAME "filesys sfs-dump mtree /data/col1/backup" > sfs-dump-DD-backup.txt

In either case, output information has one line for each file, with the following format:

/data/col1/backup/file: mtime: 1580211658000000000 fileid: 175 size: 136794 type: 9 seg_bytes: 138282 seg_count: 18 redun_seg_count: 14 (78%) pre_lc_size: 15045 post_lc_size: 8760 (58%) mode: 02000100644 start_offset: 0 end_offset: 18446744073709551615

DDOS releases 8.1.0.0, 7.13.1.10, 7.10.1.30 and 7.7.5.40 or later:

filesys sfs-dump:

  • The possibility to add the -c option, so output is printed in a CSV-like formatted version (fields separated by columns)
  • The possibility to run the file dump information for all the files in the FS at once
  • Use ssh session logging to collect the output.

Command continues to be the same as for the immediately previous versions, with the enhancements mentioned, summarized below:
 

#### File listing for all the files in a given MTree path
# filesys sfs-dump mtree <mtree-path>

#### File listing for all the files in a given MTree path, with CSV-like output
# filesys sfs-dump -c mtree <mtree-path>

#### File listing for all the files in the FS
# filesys sfs-dump

#### File listing for all the files in the FS, with CSV-like output
# filesys sfs-dump -c


This command continues to be "hidden," so that it does not appear in the CLI interactive help or in the docs.
 

How to process sfs_dump or filesys sfs-dump data into something useful:

The format for the individual files being reported in output when using filesys sfs-dump or sfs_dump without the -c option can be summarized as follows:

/data/col1/backup/file: mtime: 1580211658000000000 fileid: 175 size: 136794 type: 9 seg_bytes: 138282 seg_count: 18 redun_seg_count: 14 (78%) pre_lc_size: 15045 post_lc_size: 8760 (58%) mode: 02000100644 start_offset: 0 end_offset: 18446744073709551615

Field 01 : filename with full path (/data/col1/backup/file)
Field 03 : file last modification time as UNIX epoch in nanoseconds (nanoseconds since January 1st 1970 at midnight)
Field 05 : file (inode) ID, inode numbers are per MTree (175)
Field 07 : pre-comp size for the file in bytes (136794)
Field 09 : type of file (9)
Field 11 : segment store bytes (this is NOT the file size), please ignore (138282)
Field 13 : segment count, or the number of segments the file consists of (18)
Field 15 : number of file segments which are redundant, or not unique (14)
Field 16 : percentage of non-unique segments to total segments (78%)
Field 18 : combined size of the file's unique segments after deduplication (15045)
Field 20 : combined size of the file's unique segments after deduplication and local compression (8760)
Field 21 : ratio of post-deduplicated + post-compressed unique segments size to post-deduplicated unique segment size (58%), shows how well local compression worked
Field 23 : file mode, please ignore

In the example above, we have a file with the original size as 136,794 bytes, which after being passed through the Data Domain FS ingest pipeline, is calculated to use 15,045 bytes after deduplication, and 8760 bytes when the unique segments for the file are compressed before being written to disk. Hence:

  • The file deduplication (what we call "gcomp," or global compression) is a factor x9.09 (136,794 to 15,045 bytes)
  • The file local compression (what we call "lcomp," for local compression) is a factor x1.72 (15,045 to 8760 bytes)
  • The total file size reduction estimated (known as "compression ratio") is a factor x15.62 (136,794 to 8760 bytes)

Alternately, sfs_dump -c output is similar, but more terse:

/data/col1/backup/file       1656995350000000000     78      33554431        33668007        4016    0       33668007        34208564

Field 01 : filename with full path (/data/col1/backup/file)
Field 02 : file last modification time as UNIX epoch in nanoseconds (nanoseconds since January 1st 1970 at midnight)
Field 03 : file (inode) ID, inode numbers are per MTree
Field 04 : pre-comp size for the file in bytes (33554431)
Filed 05 : segment store bytes (this is NOT the file size), please ignore (33668007)
Field 06 : segment count, or the number of segments the file consists of (4016)
Field 07 : number of file segments which are redundant, or not unique (0)
Field 08 : combined size of the file's unique segments after deduplication (33668007)
Field 09 : combined size of the file's unique segments after deduplication and local compression (34208564)

For this example, we can make the same calculations as for the previous one:

  • The file deduplication (what we call "gcomp," or global compression) is a factor x0.99 (33,554,431 to 33,668,007 bytes)
  • The file local compression (what we call "lcomp," for local compression) is a factor x0.98 (33,668,007 to 34,208,564 bytes)
  • The total file size reduction estimated (known as "compression ratio") is a factor x0.98 (33,554,431 to 34,208,564 bytes)

Compared to the file shown for the no -c example, this one achieves no deduplication (no redundant segments) nor any local compression. This indicates that the file is a compressed one.
 

Caution: Data in the output files regarding size other than "pre-comp size for the file in bytes" is to be taken as approximate and cannot be fully relied on. Metadata for files is created at the time of file ingestion, and while it is correct then, it is not updated at any time since, so it goes stale over time. Also, that a file's post_lc_size is a given amount does not imply that the disk space used by that file is identical to that amount, as there are lots of FS level overheads that are not considered on a per-file level. The limitations are the same as for those already known for the command:
mtree show compression
Leveraging the information in the output files above consists of processing the numerical data in a way that suits the user's goals. A few examples of use cases may be:
  • Determining the gcomp and lcomp for files corresponding to a particular path (if a path can be matched to some particular backup server, client, policy, job, so forth)
  • Calculating how much data (pre-comp) is stored in a given location which is older than a given amount of time (to determine orphans or troubleshooting problems with backup application not expiring backups past due on time)
  • Any other type of stat one may have a use for
To have a single set of instructions, and to compensate for the loss of the sfs_dump -c option in more recent releases, our recommendations are to get the output data converted to the CSV format above, and then use the file in CSV format to be further processed, however, depending on your skills, you may process the non-CSV output directly.


Convert out to CSV

To convert a non-CSV sfs_dump output into one which is identical to what sfs_dump -c would have printed, you may use the following Linux command line:

# cat sfs-dump-noCSV.txt | grep ^/ | awk '
BEGIN {print "name\tmtime\tfileid\tsize\tseg_bytes\tseg_count\tredun_seg_count\tpre_lc_size\tpost_lc_size"}
{
    colon_index = index($0, ":")
    filename = substr($0, 1, colon_index - 1)
    gsub(/^[[:space:]]+|[[:space:]]+$/, "", filename)
    n = split(substr($0, colon_index + 1), fields, " ")
    print filename "\t" fields[2] "\t" fields[4] "\t" fields[6] "\t" fields[10] "\t" fields[12] "\t" fields[14] "\t" fields[17] "\t" fields[19]
}' > sfs-dump-CSV.csv

 

Note: The command above ignores everything in the input file (sfs-dump-noCSV.txt) not starting with a slash, as only lines starting with "/data" are to contain file details for processing. Output files have a first line as the header, and are using the tabulator character (\t) as the field delimiter.


Everything beyond this point is provided to users as-is. There is no guarantee or obligation on the side of Dell for the instructions shared below. Obtaining intelligence from file listing details is a user's task, and how to achieve that depends entirely on the tool set used for mining data in file dump output, the goal to be achieved, and the user's own expertise in processing data from the command output detailed above. One user may choose to use Linux command-line processing of text files to dump some aggregates, others may choose to produce input for charting values using "gnuplot," while the majority of users are believed to seek a more simple (but limited) approach, to build a spreadsheet from the CSV file for analysis.

Dell has made an effort to ensure that the instructions below are correct, working, and useful for users, but we cannot guarantee that they work for you, and we do not provide support for them, as they are outside the span of Support.


Importing CSV-formatted sfs_dump output into a spreadsheet (example for Excel):

Once the CSV version of the file details listing is available, one way to get intelligence from the data is loading up the data into a spreadsheet software. Microsoft Excel is used as an example, although instructions should be similar for other software. To get the CSV text file imported into Excel as CSV, follow the next steps:

  1. Open up Excel and create a new blank spreadsheet
  2. Go to the Data menu on the top, then click the icon named From Text/CSV
  3. Use the dialog to locate the file data in CSV format (sfs-dump-CSV.csv as in the example conversion script from non-CSV format), select it, and click Import
  4. If the input file format is correct and Excel could draw the format from the first 200 lines (which it should), a dialog should show the preview of the spreadsheet to be generated. Review the information to ensure it looks good, including that the first line is detected as the headers
  5. If everything looks good with the preview, click the Load button on the bottom of the dialog, and your new spreadsheet is presented to you with nice formatting and the field headers turned into search and filter-enabled ones.

The data in the spreadsheet is already useful, you can do things such as apply a Number Filter to numeric fields such as size (to only show files above a given size), or apply a Text Filter on the name field so that only files in the Data Domain file details output matching a given pattern (only those starting with the path for a particular MTree) are shown, and do other derived calculations from that data.

A properly imported Excel spreadsheet should contain data in columns A to I. If data is only present in column A, but stretches across the screen, then close Excel and retry steps 1 through 5 above.

Add extra calculated fields

The spreadsheet may be extended with as many calculated fields as a user requires. An interesting field to have added is the human-readable date and time corresponding to the files' mtime (last modification time, usually, the time it was written to the DD). Also, calculating and showing some per-file compression values can be helpful depending on what the data is being used for. The new columns are populated automatically by using Excel formulas as described below.

Timestamp conversion:

To get the UNIX like timestamp converted to human readable date and time:

  1. Right click cell J2, and choose "Format Cells"
  2. In the "Category" list on the left, choose "Custom."
  3. If not already there, then create the format for the human-readable date-time format (the example includes the EST string at the end. You may replace it with your textual time zone definition, or remove "EST" entirely if not interested): yyyy-mm-dd hh:mm:ss "EST"
  4. Click "Ok" when done. Now cell J2 has a custom format for data.
  5. Add the following formula to cell J2. In the formula, replace the (-6*3600) with the correct time zone difference corresponding to UTC for the configured time zone in the Data Domain at the time of getting the "sfs-dump" data. For example, Eastern US time is 6 hours behind UTC during the summer, hence the "-6" here.
If not interested in such a precise time adjustment, you may use an abbreviated form of the formula instead. The easiest way to get the formula added and avoid the date and time format being overwritten is to copy either of the two versions below, and paste into cell J2, once you have typed the "equal to" character = in the cell to start editing the cell:
(((B2/1000000000+(-6*3600))/86400)+25569)
(((B2/1000000000)/86400)+25569)
Note: If you paste text into J2 without first putting the cell into edit mode (so to call), you overwrite the cell's format. Alternatively, you may precede the formula with an "=", copy the whole text, and right click cell J2 to Paste Special, then select As Text, and click OK. It should get the same effect. If this was done well, Excel automatically does a few things:
  • Paste the formula to cell J2, and also to each and every cell of the rest of the lines in the spreadsheet 
  • Calculate the formula for the whole spreadsheet, and show the date and time value in the configured format
  • Pretty format the column as the existing ones, and have the header configured as a Date with the possibility to apply Date Filters, the only thing you would have to do is setting a proper name for the column (Date)

Compression information

You may add columns for the per-file compression information, by doing as explained above. This time we are showing the formulae including the "=", so you must copy the whole text, and Paste Special as text:

  • Copy the following formula and Paste Special as Text into cell K2, to create a column for the per-file deduplication, or gcomp:
    =IF(H2=0,0,D2/H2)
  • Copy the following formula and Paste Special as Text into cell L2, to create a column for the per-file local compression, or lcomp:
    =IF(I2=0,0,H2/I2)
  • Copy the following formula and Paste Special as Text into cell M2, to create a column for the per-file total file compression ratio:
    =K2*L2

Give the new columns a proper name. If you want the format for the compression values to be another (for example, to limit the number of decimals), do as in the example for the date and time, and set the cell numeric format ahead of time, before pasting the formula.

Remember to keep your work safe by going to File menu and doing. Save as ensuring the type for the saved file is set to Excel Workbook (*.xlsx), so that formatting and filtering are kept.

On completion of the actions described, your spreadsheet contains the following (relevant) columns:

  • A contains a filename
  • B contains the datestamp (in UNIX epoch in nanoseconds) for when the file was last written
  • D is the original size of the file
  • H is the size after global compression
  • I is the size after global and local compression
  • J contains the datestamp for when the file was last written, in human-readable form
  • K contains the global compression (deduplication) for the file
  • L contains the local compression for the file
  • M contains the total compression for the file

All file sizes are in bytes.
You can now use Excel to filter or sort as required to report on your data.

  • For example, to show only files within the MTree /data/col1/backup which are over 30 days old:
  1. Click the down arrow in the Name header, select Text Filters, then Begins With and type /data/col1/backup/ into the box. The trailing slash is important. Click OK
  2. Click the down arrow in the Last Written Date Header, select Date Filters, then Before. Use the date picker on the right of the dialog to select a date 30 days ago. Click OK.
The status bar at the bottom shows how many rows match this selection.

其他資訊

Command to dump file location (Active and Cloud)

There is a way to dump the listing of files with an indication of which are in Active and which in any cloud units, which is:
#### For the DD FS as a whole
# filesys report generate file-location

#### For an individual MTree or subdirectory
# filesys report generate file-location path /data/col1/test

Output in either case is of the following type:

-------------------------      ----------------------      ----------------------      ---------------------------
File Name                      Location(Unit Name)         Size                        Placement Time
-------------------------      ----------------------      ----------------------      ---------------------------
/data/col1/test/file1.bin                  Cloud-Unit                  200.00 GiB      Sat Mar  5 13:24:10 2022
/data/col1/test/file2.bin                      Active                   10.00 TiB      Sat May 14 00:48:17 2022
-------------------------      ----------------------      ----------------------      ---------------------------
However the command above tends to be inadequate for most intents, as, for example, size is in human-readable units (rather than in bytes or MiB or any set multiplier) and placement time matches the time the backups was made to the Data Domain for files in Active, but it does not for files in Cloud ("Placement Time" for files in cloud is the date and time when the file was moved to Cloud unit).

Also, be advised that the below command (and the script converting from non-CSV sfs_dump) outputs data in columns which are separated by tabs. The tab characters present in the output must be saved in the logging file, otherwise the steps detailed above to import the CSV into Excel cannot separate the fields correctly.
sfs_dump -c
The instructions that are given assume a reasonable knowledge of Excel, but may be translated into other spreadsheet software.

Modern versions of Excel allow many rows, but bear in mind that each file on the sfs_dump output to be processed is a single line, so Excel must be able to swiftly process and update a spreadsheet with as many rows as files in your dataset. The hard limit is slightly over 1 million rows, but even at file counts well below that, Excel may not be the appropriate for the job (too slow).

If your sfs_dump output has too many files for your version of Excel or want to process data in smaller bits for performance, you can try running the procedure once for each MTree, so you have multiple spreadsheets for the system.

Even a single MTree sfs_dump output may be too large for Excel, in which case you may use the split Linux command (or any other similar tool to split a large text file online end boundaries) to have several smaller CSV files to process one at a time, for example:
# split  -b <size> <filename> <new filename prefix>  (splits by file size)
# split  -l <size> <filename> <new filename prefix>  (splits by number of lines)
# split  -n <size> <filename> <new filename prefix>  (splits by number of pieces)

For example, to split an input text file in chunks 200 MiB each, so that the pieces are named starting:

"sfs_dump.out.split"

Then run:

# split -b 200M sfs_dump.out sfs_dump.out.split


Script to use "filesys sfs-dump" for whole FS dumping of file data

For those few releases which lacked the ability to dump file details in CSV format, the script shown further below is provided as-is (with no guarantee) by Dell to users to achieve a similar result by processing non-CSV sfs_dump output.

As the releases not supporting CSV output are also those no allowing to dump the information for all files in the FS as a whole, the script uses SSH to connect to the target Data Domain to iterate over the list of MTrees, running the file dump one MTree at a time, to collect the file listing for all the MTrees, then transform into a CSV format adequate for further processing:
#!/bin/bash

#### WARNING
####     This script is provided to you by Dell Technologies with NO GUARANTEE, as best-effort sample code to get a full FS sfs-dump collected
####     for DDOS releases which do not support "se sfs-dump", or as an alternative to it, in releases which support "filesys fs-dump"
####     That this script does not work for you, or if you need help setting it up, extending it, or resolving any issues, is not entitled to support
####     This script is not part of Dell PowerProtect / Data Domain, and hence it is not supported

#### Replace values below to suit your needs
USERNAME="sysadmin"
DD_HOSTNAME="10.60.36.172"
#### NO CHANGES NEEDED BEYOND THIS POINT

clear
echo "Script collects a full FS sfs-dump from \"${DD_HOSTNAME}\", using the command \"filesys sfs-dump\", one MTree at a time"
echo "    * Script has to be configured by settting the \"USERNAME\" and \"DD_HOSTNAME\" variables within the top of the script"
echo "    * Script expects passwordless SSH connection as \"USERNAME\" to the \"DD_HOSTNAME\" configured"
echo "    * To configure passwordless SSH login to a DataDomain, check KB at https://www.dell.com/support/kbdoc/000004033 "
echo "    * Test passwordless login is configured and working prior to going ahead with this script"
echo "    * If passwordless login is not configured, script will ask for the \"${USERNAME}\" password "
echo "          - Once for getting the MTree list"
echo "          - And once for each one of the MTrees in \"${DD_HOSTNAME}\" "
echo
echo -n "Are you sure you want to continue? (y/n) : "
read -n 1 answer
echo
if [ "${answer}" = "y" ]; then
    echo "Going ahead with the script."
    echo
else
    echo "Stopping script now. Re-run when passwordless login to \"${DD_HOSTNAME}\" as \"${USERNAME}\" works. Bye."
    exit 1
fi

echo -n "1/6 : Collecting list of MTrees from DD..."
ssh ${USERNAME}@${DD_HOSTNAME} "mtree list" 2>/dev/null | grep ^/ | awk '{print $(NF-3)}' > mtree-list.txt
echo "Done."

n_mtrees=$( wc -l mtree-list.txt | cut -d" "  -f1 )
echo -n "2/6 : Collecting per-Mtree sfs-dump information for ${n_mtrees} MTrees ..."
for mtree in `cat mtree-list.txt`; do
    name=$(echo $mtree | cut -d/ -f4)
    ssh ${USERNAME}@${DD_HOSTNAME} "filesys sfs-dump mtree ${mtree}" 2>/dev/null | grep ^/ > sfs-dump-${name}.txt
    echo -n "."
done
echo "Done."

echo -n "3/6 : Putting all the files together..."
for file in `ls sfs-dump-*.txt`; do 
    if [ -s "${file}" ]; then cat ${file} >> sfs-dump-noCSV.txt; fi
done
echo "Done."

echo -n "4/6 : Converting sfs-dump output to CSV format..."
cat sfs-dump-noCSV.txt | grep ^/ | grep -v ^$ | awk '
BEGIN {print "name\tmtime\tfileid\tsize\tseg_bytes\tseg_count\tredun_seg_count\tpre_lc_size\tpost_lc_size"}
{
    colon_index = index($0, ":")
    filename = substr($0, 1, colon_index - 1)
    gsub(/^[[:space:]]+|[[:space:]]+$/, "", filename)
    n = split(substr($0, colon_index + 1), fields, " ")
    print filename "\t" fields[2] "\t" fields[4] "\t" fields[6] "\t" fields[10] "\t" fields[12] "\t" fields[14] "\t" fields[17] "\t" fields[19]
}' > sfs-dump-CSV.csv
echo "Done."

echo -n "5/6 : Cleaning up..."
for mtree in `cat mtree-list.txt`; do name=$(echo $mtree | cut -d/ -f4); rm -f sfs-dump-${name}.txt ; done
rm sfs-dump-noCSV.txt
rm mtree-list.txt
echo "Done."


echo -n "6/6 : Summary"
echo
n_files=$( wc -l sfs-dump-CSV.csv | cut -d" "  -f1 )
echo
echo "Collecting whole FS sfs-dump data from ${HOSTNAME} completed"
echo "File includes output for ${n_mtrees} MTrees, with a combined $(( ${n_files} - 1 )) files across Active and Cloud Tiers (if applicable)"
echo "Start of file shown below for your convenience :"
echo "===================="
head -5 sfs-dump-CSV.csv
echo "===================="
echo
echo "Done."

exit 0

Script to combine the output of non-CSV "sfs-dump" and "filesys report generate file-location" into a CSV file with all the same information as script above and per file tier and placement time information

The following script is provided as-is (with no guarantee) by Dell to users as a means to hopefully add value to the output of the sfs_dump and filesys report generate file-location commands above. A user can filter out files based on tier (Active or any of the up two configured Cloud Units) for gaining a more precise insight on file distribution per tier by adding the tier location and placement time information to each file entry in the output CSV.

Scripts expect the sfs-dump (not sfs_dump -c) output as the first parameter, and the filesys report generate file-location output as the second. Output is written to a hard coded file name "sfs-dump-output-tiers.csv," which may be changed within the script itself.

The output may be processed using Excel in the same way as explained above.
#!/bin/bash

#### WARNING
####     This script is provided to you by Dell Technologies with NO GUARANTEE, as best-effort sample code to match the output from commands :
####       * sfs-dump (in non CSV format)
####       * filesys report generate file-location
####     So that a new CSV with the file paths appearing on both is created for all the data in sfs-dump file with the added tier and placement time information from location report
####
####     This script is not part of Dell PowerProtect / Data Domain, and hence it is not supported
####
####     Usage : extend-sfs-dump-with-tier.sh sfs-dump-output.csv file-location-output.log
####     Output : static "sfs-dump-output-tiers.csv" file name (may be changed below)

#### Replace values below to suit your needs
OUTPUT_FILENAME="sfs-dump-output-tiers.csv"
#### NO CHANGES NEEDED BEYOND THIS POINT

clear

if [ ! $# == 2 ]; then
    echo "Combine output from sfs-dump and tier location report into a CSV file with tier and placement time information"
    echo
    echo "Usage : $0 SFS-DUMP-OUTPUT-FILE    REPORT-FILE-LOCATION-FILE"
    echo "NOTE : SFS-DUMP-OUTPUT-FILE has to be in non-CSV format"
    exit 1
fi

INPUT_SFSDUMP="$1"
INPUT_LOCATION="$2"


echo -n "1/6 : Sanity checking input files..."
if [ ! -s "${INPUT_SFSDUMP}" ]; then
    echo "Input file ${INPUT_SFSDUMP} does not exist"
    exit 1
fi
if [ ! -s "${INPUT_LOCATION}" ]; then
    echo "Input file ${INPUT_LOCATION} does not exist"
    exit 1
fi
n_files_sfsdump=`grep ^/ ${INPUT_SFSDUMP} | wc -l`
n_files_location=`grep ^/ ${INPUT_LOCATION} | wc -l`
if [ ${n_files_sfsdump} -eq ${n_files_location} ]; then
    echo -n "both have the same amount of files (${n_files_location}). "
else
    echo -n "sfs-dump has ${n_files_sfsdump} files whereas location report has ${n_files_location} files, this may be normal if the difference is small. "
fi
echo "Done."


echo -n "2/6 : Sanitize \"file-location\" input..."
cat ${INPUT_LOCATION} | awk 'BEGIN {rejected="temp-location-rejected.log"; accepted="temp-location-accepted.log"} { if ( $0 ~ "Missing -unit") { gsub(/Missing -unit/, "Missing-Cloud-Unit", $0); print $0 > rejected } else { if ($0 ~ "^/" ) print $0 > accepted } }'
if [ -s "temp-location-rejected.log" ]; then
    REJECTS_EXIST="yes"
    echo -n "Some files in location report sit in unavailable or deleted cloud units, you may need to re-run this script after fixing the issue and gathering a new location report. "
    cat temp-location-rejected.log temp-location-accepted.log | sed -e 's/\t/:\t/' | sort > temp-location-report-sorted.log
    rm temp-location-rejected.log
else
    cat temp-location-accepted.log | sed -e 's/\t/:\t/' | sort > temp-location-report-sorted.log
    REJECTS_EXIST="no"
fi
rm temp-location-accepted.log
echo "Done."


echo -n "3/6 : Sanitize \"sfs-dump\" input..."
cat ${INPUT_SFSDUMP} | grep ^/ | sort > temp-sfs-dump.log
echo "Done."


echo -n "4/6 : Merging information for sfs-dump and location report..."
join -1 1 -2 1 temp-sfs-dump.log temp-location-report-sorted.log > temp-merged-information.log
rm temp-sfs-dump.log
rm temp-location-report-sorted.log
n_files_combined=`grep ^/ temp-merged-information.log | wc -l`
if [ ${n_files_combined} -eq 0 ]; then
    echo "No files matched from input files. sfs-dump output must NOT be in CSV format. Exiting."
    rm temp-merged-information.log
    exit 1
fi
echo -n "Input files matched on ${n_files_combined} files. "
echo "Done."


echo -n "5/6 : Converting merged sfs-dump / location-report output to CSV format..."
cat temp-merged-information.log | grep ^/ | grep -v ^$ | awk '
BEGIN {print "name\tmtime\tfileid\tsize\tseg_bytes\tseg_count\tredun_seg_count\tpre_lc_size\tpost_lc_size\ttier\tplacement_time"}
{
    colon_index = index($0, ":")
    filename = substr($0, 1, colon_index - 1)
    gsub(/^[[:space:]]+|[[:space:]]+$/, "", filename)
    n = split(substr($0, colon_index + 1), fields, " ")
    print filename "\t" fields[2] "\t" fields[4] "\t" fields[6] "\t" fields[10] "\t" fields[12] "\t" fields[14] "\t" fields[17] "\t" fields[19] "\t" fields[27] \
        "\t" fields[length(fields)-4] " " fields[length(fields)-3] " " fields[length(fields)-2] " " fields[length(fields)-1] " " fields[length(fields)]
}' > ${OUTPUT_FILENAME}
rm temp-merged-information.log
echo "Done."


echo -n "6/6 : Summary"
echo
echo
echo "Merging information from sfs-dump (${INPUT_SFSDUMP}) and location-report ${INPUT_LOCATION} output completed."
echo "Output file (${OUTPUT_FILENAME}) includes information for a total ${n_files_combined} files, out of ${n_files_sfsdump} in input sfs-dump, and ${n_files_location} in input location report."
if [ "${REJECTS_EXIST}" == "yes" ]; then
    echo "Note there are some files in disconnected or deleted cloud units, for which the \"tier\" field has been replaced with \"Missing-Cloud-Unit\"."
fi
echo
echo "Start of file shown below for your convenience :"
echo "===================="
head -5 ${OUTPUT_FILENAME}
echo "===================="
echo
echo "You may follow the instructions in https://www.dell.com/support/kbdoc/000081345 to process this CSV file in an spreadhseet"
echo
echo "Done."

exit 0


For Veritas NetBackup users:

Veritas NetBackup (NBU) is known to create files in a Data Domain with colon characters as part of the file names. For example, the following are valid NBU file name paths when the Data Domain is used as backend storage for NBU:
/data/col1/MTREE_NAME/POLICY-NAME_1400502741_C1_F1:1400502741:VM_PRD-02:4:1::
/data/col1/MTREE_NAME/POLICY-NAME_1400502741_C1_F2:1400502741:VM_PRD-02:4:1::
/data/col1/MTREE_NAME/POLICY-NAME_1400502741_C1_HDR:1400502741:VM_PRD-02:4:1::
This poses an issue with the example scripts above, as the colon character is used as a delimiter for the sfs_dump command output, and running the scripts above would yield incorrect results.

For such cases, you must edit the script in a way such as below:
--- iterate-dd-for-fs-sfs-dump.sh       2024-01-23 06:32:16.521409000 -0500
+++ iterate-dd-for-fs-sfs-dump-NBU.sh   2024-02-27 03:26:42.808246000 -0500
@@ -55,11 +55,11 @@
 cat sfs-dump-noCSV.txt | grep ^/ | grep -v ^$ | awk '
 BEGIN {print "name\tmtime\tfileid\tsize\tseg_bytes\tseg_count\tredun_seg_count\tpre_lc_size\tpost_lc_size"}
 {
-    colon_index = index($0, ":")
-    filename = substr($0, 1, colon_index - 1)
+    colon_index = index($0, ":::")
+    filename = substr($0, 1, colon_index + 1)
     gsub(/^[[:space:]]+|[[:space:]]+$/, "", filename)
     n = split(substr($0, colon_index + 1), fields, " ")
-    print filename "\t" fields[2] "\t" fields[4] "\t" fields[6] "\t" fields[10] "\t" fields[12] "\t" fields[14] "\t" fields[17] "\t" fields[19]
+    print filename "\t" fields[3] "\t" fields[5] "\t" fields[7] "\t" fields[11] "\t" fields[13] "\t" fields[15] "\t" fields[18] "\t" fields[20] "\t"
 }' > sfs-dump-CSV.csv
 echo "Done."
While the changes are shared for the script to iterate over all MTrees in a Data Domain to pull per-MTree sfs_dump data, the changes are the same for the other script. However, as it is also the case for the scripts themselves, the changes above are provided by Dell to you without any guarantee, in the hope they are useful.

受影響的產品

Data Domain

產品

Data Domain
文章屬性
文章編號: 000081345
文章類型: Solution
上次修改時間: 27 5月 2025
版本:  21
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。