PowerScale:如何執行叢集上分析工具

Summary: 如何使用 Isilon 叢集上分析工具 (IOCA) 及解釋結果的說明。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

Isilon 叢集上分析 (IOCA) 工具可分析執行中 PowerScale 叢集的健全狀況,並協助* 規劃升級。

* IOCA 的設計目的並非讓您在準備升級活動時,取代檢閱升級相關的說明文件。請參閱 OneFS 升級規劃和程序指南:

 

最新版本的 IOCA 工具可從 Lightning 下載。
     

注意:即使下載日期較舊,系統仍會下載最新版本。

注意:下載並傳輸至叢集後,您必須使用 解壓縮 IOCA 和 IOCA.sha256

# tar -xvf IOCA.tar
IOCA
IOCA.sha256

若要驗證 IOCA 的完整性,您可以使用上述 tar 套件中的 sha256 檔案,或者可以從此處下載獨立的 IOCA.sha256 檔案。
下載後,將其傳輸到與 IOCA 指令碼相同的位置(確保覆蓋現有的 IOCA.sha256)。

如果使用 sha256sum:

# sha256sum -c /home/nyhanj1/IOCA.sha256
./IOCA: OK

觀察以上粗體字的結果,並確認其正常

如果您沒有 sha256sum:

# cat IOCA.sha256
a55c9efcea29776317d3b3ed36c504dcab08d1f945161f6ac6c8bbb315f31bb0 ./IOCA
# sha256 IOCA
SHA256 (IOCA) = a55c9efcea29776317d3b3ed36c504dcab08d1f945161f6ac6c8bbb315f31bb0

手動驗證兩個檢查總和相符結果。


若要在叢集上執行 IOCA:

  1. 確認您已連線至正確的叢集。執行下列命令以顯示叢集中的序號,確認服務要求的序號顯示在清單中:
isi_for_array cat /etc/isilon_serial_number
  1. 如果目錄不存在,請執行以下命令以建立 /ifs/data/Isilon_Support/ directory 目錄,並進入其中:
mkdir -pv /ifs/data/Isilon_Support

cd /ifs/data/Isilon_Support
  1. 將最新版本的 IOCA 暫存到叢集中的 /ifs/data/Isilon_Support/ 目錄內:

  • 如果存在先前的版本,請執行下列命令以確認其是否為最新版本,並將其與可供下載的版本進行比較。版本會列在執行狀況檢查指令檔的頂端。
perl IOCA -v
執行 IOCA 工具,新增任何其他引數。在下列範例中,升級前檢查包括升級至 9.5.1.0 的檢查,並在取得執行狀況檢查結果後顯示升級計畫建議:
perl IOCA -u 9.5.1.0
  1. 收集輸出結果,並張貼到服務要求。
  2. 檢閱所有已識別「故障」或「警告」訊息的執行狀況檢查,以找出潛在問題。


解譯 IOCA 輸出: 

對於每個執行狀況檢查項目,如果發現任何情況,通常都有相關的知識庫 (KB) 文章。此工具原本是一種內部專用工具,它所顯示的文章有一部份尚未提供外部存取。目前正在促進外部使用者對這些參考文章的存取。

以下是叢集上 BMC 或 CMC 硬體監控檢查失敗的輸出範例:
BMC/CMC Hardware Monitoring                       FAIL
  FAIL: Hardware monitoring issues detected on nodes: 2
  INFO: 3 nodes have out of date CMC firmware versions: 1-3
  INFO: Refer to KB489050 (https://support.emc.com/kb/489050) for details.

在此範例中顯示了「FAIL」,並表示節點 2 具有硬體監視問題。其中亦包含「INFO」,顯示節點 1 至 3 的 CMC 韌體版本已過時。最後,它包含一個 KB 文章,其中包含解決步驟。 


相容模式

在啟用相容模式的叢集上,會以在不相容模式叢集上的相同方式執行 IOCA。不需要使用 sudo 執行。但是,compadmin 使用者必須是 IOCA 指令檔的擁有者才能執行。

以下是 IOCA 和可用引數或篩選的使用方法:

Usage: IOCA [options] [destination OneFS version]
    -d, --debug       Display debugging information
    -e, --extra       Displays extra details as part of each check
    -j, --json        Displays output in JSON format
    -v, --version     Displays current script version
    -h, --help        Displays this help screen
    -r <checkName>, --run=<checkName>
        Executes only the specified check, can be included multiple times
    -u, --upgradeplan Includes an upgrade plan after health checks
    --rolling         Provide rolling reboot plans
    --parallel        Provide parallel reboot plans [where supported]
    --simultaneous    Provide simultaneous reboot plans [excludes node firmware]
    -o, --onefs
        Supports the following comma separated options [ex. 8.1.2,simultaneous]:
            <version>       Uses the provided destination OneFS version
            simultaneous    Simultaneous OneFS upgrade
            parallel        Parallel OneFS upgrade [requires 8.2.2+]
            rolling         Rolling OneFS upgrade
            exclude-nf      Upgrade plans will combine OneFS + node firmware by
                            default [9.2 feature], this option disables that
    -p, --patches
        Supports the following comma separated options [ex. none,simultaneous]:
            none            Opt out of patch recommendations
            simultaneous    Simultaneous patch installs
            parallel        Parallel patch installs [requires 9.1+]
            rolling         Rolling patch installs
    -nf, --node-firmware
        Supports the following comma separated options [ex. 10.3.3,parallel]:
            <version>       Uses the provided version for node firmware checks
            none            Opt out of node firmware recommendations
            simultaneous    Simultaneous node firmware updates [requires 8.2+]
            parallel        Parallel node firmware updates [requires 8.2+]
            rolling         Rolling node firmware updates
    -df, --drive-firmware
        Supports the following comma separated options [ex. 1.32]:
            <version>       Uses the provided version for drive firmware checks
            none            Opt out of drive firmware recommendations
    -vf, --verify-files
        Runs checks on files within specified location [ex. /ifs/data/] of certain file type [ex. .isi, .tgz, .tar.gz, .tar]
        <file location>     default location is /ifs/data/ - specify the location where the upgrade files were placed

Additional Information

執行此命令時所使用之個別檢查的表格和名稱:

--run=CHECK

注意:IOCA 指令檔會經常更新。如果您想要查看新的完整檢查清單,請務必在 Isilon Cluster 上更新至最新 IOCA 版本,並執行以下命令以取得完整清單。

perl

onefs94-a-1# perl IOCA --run=CHECK

Isilon 叢集上分析                        0.1541

要求的檢查,檢查,無法辨識。
可提供的檢查包括:
        checkA100Root                 檢查 A100 節點根鏡像是否需要調整為 2GB
        checkAPIAuth                  檢查從 8.1.2.0 升級至 9.2 或更新版本時,API 驗證是否設定為基本驗證
        checkAccessZones              檢查跨 7.1.1 版本升級時是否設定多個存取區域。檢查巢狀或重疊的 SMB 共用。在跨 OneFS 7.1.1 版本升級時,當發現任何非系統存取區域集區時發出警告
        checkAggregationMode          檢查升級至 OneFS 8+ 的彙總模式是否不是舊版 FEC 模式
        checkAspera                   檢查是否啟用任何 Aspera 服務。若執行 OneFS 升級,必須在升級後重新安裝
        checkAuthStatus               檢查每個節點的驗證狀態。在任何驗證提供者未連線或處於使用中狀態發出警告。  檢查 RFC2307 和 GID/UID 的自動定位,並指向知識庫文章 KB 000028577
        checkBBUDegCap                檢查 Gen6 節點上的 BBU 降級程度,並標示任何可能增加節點進入唯讀狀態風險的過度降級節點。
        checkBMCandCMC                檢查 BMC/CMC 相關問題
        checkBXENodes                 檢查是否存在具有 BXE 介面的節點,同時檢查知識庫文章 KB 000048172 和 KB 000064027 中的已知問題
        checkBootDisks                檢查開機磁碟剩餘使用壽命、韌體等級和歷史錯誤計數
        checkCM6FWBug                 檢查磁碟機韌體版本是否符合 FCO F022318EE 的標準
        checkCMOSTimeCentury          檢查 CMOS 時間中設定的世紀是否與目前的世紀相符
        checkCapacity                 根據升級規劃和進度指南中記錄的數字驗證叢集容量。接近限制時發出警告
        checkCloudPools               檢查 CloudPools 相關問題
        checkConfCmtSyntax            檢查 sysctl.conf 中沒有前導 # 符號的註解,這可能在解析設定檔時造成問題。
        checkContact                  Displays contact information configured in CELog when run with the --extra argument
        checkCoreDumps                Checks for recent unexpected process restarts reported in /var/log/messages
        checkDTA000194434             Checks for criteria of KB 000194434
        checkDestinationOneFS         Checks destination OneFS version
        checkDiskpools                Checks diskpools and class equivalence for OneFS upgrades going across 7.0
        checkDriveFirmware            Checks for out of date Drive Firmware and calls other related drive firmware checks
        checkDriveLoad                Checks the current load on the drives
        checkDriveStallTimeout        Checks current Drive Stall Timeout setting, recommend value is 3.5 seconds (3500000 microseconds) or higher
        checkDriveSupportPackage      Checks for drive firmware updates available in the Drive Support Package
        checkDrivesHealth             Checks health of drives and the drive stall timeout setting in sysctl
        checkET004252                 Checks for criteria of ET004252
        checkETAs                     Checks for Technical Advisories
        checkEmailSettings            Displays E-mail settings configured in CELog when run with the --extra argument
        checkEncoding                 Checks exports and cluster configuration for if utf-8/default encoding
        checkEvents                   Checks events on all the nodes, failure if any critical events exist
        checkFCOF022318EE             Checks drive firmware versions for the criteria of FCO F022318EE
        checkFCOF031617FC             Checks drive firmware versions for the criteria of KB 000024620
        checkFCOF042415EE             Checks the cluster to see if it meets criteria for FCO F042415EE/KB 000051631
        checkFileSharing              Checks if Atime is enabled
        checkFilepoolPolicies         Checks GNA requirements and checks filepools for final match being set and names starting with a number
        checkFirmwarePackages         In OneFS 9.1 and later, confirms firmware packages are available
        checkFlush                    Checks for running flush processes / active pre_flush screen sessions on clusters
        checkGatewayPriority          Checks for subnets with duplicate gateway priorities
        checkGroups                   Checks nodes for all enabled protocols.  Fails if group info is reporting that an enabled protocol is not functioning on any node
        checkHDFS                     Display HDFS details, only useful when run with --extra
        checkHardening                Checks if FIPS is enabled on node in the cluster, this needs to be disabled prior to upgrades to 9.5 or higher and re enabled after to avoid assessment failures
        checkHardwareStatus           Checks battery health, power supplies, and gathers hardware details for use elsewhere
        checkHardwareUpgrade          Checks if there is an in progress hardware upgrade
        checkHealth                   Verifies cluster health status and node health status
        checkIBInterfaces             Checks for ib0/1 as being active, checks for ETA180317 IB switch firmware versions, and checks for overlapping IB networks
        checkIBPCIeSlot               Checks if the InfiniBand card is installed in the wrong slot which may lead node start up issues during an upgrade to OneFS 9 and later releases
        checkIDI                      Checks for IDI errors in the past 90 days
        checkISCSI                    Checks for iSCSI LUNs being configured in /ifs/.ifsvar/iscsi/iscsi.conf (OneFS prior to 8.x only)
        checkIndexSnapshotCurrent     Checks for current snapshots that are over 2 weeks old and may contirbute to capacity issues
        checkInternalPing             Checks internal network by performing network ping operations
        checkJobHistory               Checks job history for issues, currently just MediaScan issues
        checkJobStatus                Checks for running jobs that would impact an upgrade
        checkJobs                     Checks jobs
        checkKB000066019              Checks size of reports.db and flags if over 100MB which may lead to issues outlined on KB 000066019
        checkKB000081658              Checks for criteria of KB 000081658
        checkKB000181818              Checks for criteria of KB 000181818
        checkKB000192800              Checks for critera of KB 000192800
        checkKB000196175              Checks for criteria of KB 000196175
        checkKB000196762              Checks for criteria of KB 000196762
        checkKB000197850              Checks for issues with IB queue pairs that would lead to node reboot issues if IB queue pairs are in a degraded state
        checkKB000212387              Checks Authentication providers msDS-SupportedEncryptionTypes attribute to ensure a value is set and assigned, if it is not, there is potential for DU after an upgrade to 9.5 or above.
        checkKB000213188              檢查當前版本低於 9.2 且目標版本為 9.5 或更高版本時的 SED 硬體。
        checkKB201488                 Checks if any node meets criteria for KB 000201488
        checkKB201666                 Checks if it is necessary to perform the proactive workaround from KB 000201666 for a patch installation and whether the pre-requisites are met
        checkKB201933                 Checks for criteria of KB 000201933
        checkKB203381                 Checks for criteria of KB 203381
        checkKB220014                 Checks for criteria of KB 220014
        checkKB462202                 Checks BootOrder in bios_settings.ini on Generation 5 nodes to determine if at risk for KB 000025523
        checkKB489473                 Checks if any node meets criteria for KB 000061983
        checkKB490849                 Checks if at risk for KB 000052089
        checkKB496582                 Checks for auth rules issues detailed in KB 000160596
        checkKB496993                 Checks if the cluster is at risk for KB 000061504
        checkKB501267                 Checks for the criteria of KB 000026510
        checkKB507031                 Checks for criteria outlined in KB 000035398
        checkKB516613                 Checks if any node meets criteria for KB 000057267
        checkKB519119                 Checks if nodes may be impacted by KB 519119
        checkKB519388                 Pre-upgrade check for issues outlined in KB 000162270
        checkKB519423                 Checks if the cluster config files are in a mixed mode
        checkKB519890                 Checks for a known issue with LAGG interfaces in LACP mode when running OneFS 8.0.0.6, 8.0.1.2, 8.1.0.2, and 8.1.1.1
        checkKB521778                 Checks for criteria outlined in KB 000031948
        checkKB521890                 Checks for criteria outlined in KB 000167681
        checkKB524082                 Checks if the cluster is enabled for HTTP clients and flags a compatibility issue caused by a change in Apache versions
        checkKB527312                 Check for criteria of KB 000166965
        checkKB530050                 Check for criteria of KB 000040987
        checkKB533516                 Checks if cluster uses an IP for AWS CloudPool accounts putting it at risk for DTA 533516
        checkKB535582                 Checks if at risk for KB 000060471
        checkKB537785                 Check for criteria of KB 000168829
        checkKB540000                 Checks for criteria of KB 000058599
        checkKB540071                 Checks for existence of files under /var/fw/fwpkg when no IsiFw package is installed
        checkKB540513                 Checks for criteria of KB 000174074
        checkKB540872                 Checks if the cluster may encounter KB 000170982 during an upgrade from OneFS 8.2 releases
        checkKB540901                 Checks boot disk partitions for any mismatches in uuids which may lead to boot failures
        checkKB544401                 Check for critera of KB 000173157
        checkKB544854                 Check for criteria of KB 000173432
        checkKB546604                 Checks for criteria of KB 000180866
        checkKerberos8000             Checks for an issue with the Kerberos configuration file when upgrading to OneFS 8.0.0.0
        checkLACPSFP                  Checks for LACP on cxgb interfaces for KB 000174095
        checkLWIODLog                 Checks /var/log/lwiod.log for known errors occuring in the last 30 days
        checkLastZoneID8000           Checks for gaps in access zones that may cause major issues when upgrading to OneFS 8.0.0.0
        checkLeakFreeBlocks           Checks for nodes with efs\.lbm\.leak_freed_blocks enabled.
        checkLegacyLDAP               檢查在 OneFS 6 升級至 OneFS 7 時是否啟用舊版 LDAP
        checkLicense                  檢查授權並根據授權功能提供指示。  InsightIQ 和 vCenter 授權,提供相容性指南中的資訊。  iSCSI, instructs to only perform simultaneous OneFS upgrades and that it is not supported in 8.0
        checkLinMasterPadding         Checks the LIN master padding to be all zeros
        checkListenQueue              Checks for listen queue overflows to be less than 50,000 per node
        checkLogLevel                 Checks LWSM log levels for NFS, SMB, HDFS, and Authentication
        checkLogs                     Checks Log file presence and flags if any log file specified in list is not present
        checkMaintenanceMode          Checks if the cluster is currently in maintenance mode
        checkMemory                   Checks each DIMM to meet criteria outlined in KB 000041666 and if the expected (per product info line) matches closely the reported RAM
        checkMessagesLog              Checks /var/log/messages.log for known errors occuring in the last 30 days
        checkMirrors                  Checks the boot mirror health
        checkNDMP                     Checks for running NDMP sessions
        checkNDMP16GB                 Checks for LNN changes that have occurred since the isi_ndmp_d processes started which can cause issues during the HookDataMigrationUpgrade phase of an OneFS upgrade
        checkNDMPUpgradeTimeout       Checks for LNN changes that have occurred since the isi_ndmp_d processes started which can cause issues during the HookDataMigrationUpgrade phase of an OneFS upgrade
        checkNFS                      Uses nfsstat to identify RPC errors
        checkNetBIOS                  Checks if the Isilon NetBIOS Name Service (nbns) is enabled when updating to OneFS 8.0.1 and later
        checkNetstat                  Checks connections counts for specific protocols via netstat
        checkNetworkParallelUpgrade   Checks for the risk of inaccessible network pools during a parallel upgrade
        checkNetworkPoolIFaces        Checks each network pool and their assigned interfaces, if only 1 interface is configured for any pool and IP Ranges are set, this will cause a failure for pre upgrade mandatory checks
        checkNodeCompatibility        Checks node compatibility for OneFS upgrades by comparing it against known supported versions
        checkNodeFirmware             Checks node firmware for updates
        checkNodesInstalled           Checks for nodes installed to display in an agregated list for visibility
        checkOneFSVersions            Checks running version and target version for any issues. 失敗:檢查所有節點間的任何版本不符
        checkOpenFiles                檢查每個節點的開啟檔案數量 [sysctl kern.openfiles],並與最大開啟檔案數 [sysctl kern.maxfiles] 比較。  當超過最大值的 80% 時發出警告,超過 90% 時判定失敗
        checkPSCALE136276             檢查 PSCALE-136276 的標準
        checkPartitions               檢查系統分割區空間
        checkPatches                  檢查目前的版本 (如果沒有目標版本) 或目標版本的強烈建議修補程式
        checkPerformance              檢查叢集效能
        checkProcesses                檢查 OpenSM 主節點、MCP、isi_mca_dump 和 isi_upgrade_d 程序相關問題
        checkProtectionLevel          檢查儲存集區保護等級
        checkRealACL                  檢查是否在 /ifs/.ifsvar 或 ifs/.ifsvar/patch 上設定真實 ACL。這些不應設定,如果設定了會造成升級/安裝問題
        checkRemoteSupport            檢查是否啟用受限制的 Shell 和 isi_supportassist,如果兩者都啟用且升級至 9.7,將會造成支援協助服務重新啟動的問題
        checkRoutingTables            顯示每個節點的路由表
        checkSBR8000                  檢查針對 8.0.0.0/1 的 OneFS 升級時是否在升級前啟用 SBR
        checkSNMPDConfig              檢查 SNMPD.config 和 isilon_serial_number 確保它們不是 0 位元組
        checkSPNs                     顯示 SPN 清單,僅在使用 --extra 選項時有用
        checkSRS                      檢查遠端連線設定的問題
        checkSSHDConfig               檢查 /etc/mcp/templates/sshd_config 檔案中的已知問題
        checkSWIFTAccounts            檢查 SWIFT 帳戶,用於在授權 SWIFT 且執行升級至 9.5 或更高版本時設定標示優先順序
        checkServices                 檢查常見服務以確保它們處於預期狀態
        checkServicesMonitoring       檢查已啟用的服務是否受到監控
        checkSmartConnect             檢查 SmartConnect 服務 IP 是否全部已指派且未用於用戶端連線
        checkSnapshot                 檢查快照計數是否接近叢集限制 20,000,自動刪除是否設為是,並檢查快照記錄。檢查快照記錄中的 EIN/EIO/EDEADLK/建立快照失敗
        checkStaticRouteConflict      檢查靜態路由衝突
        checkStoragePools             檢查儲存集區的健康狀態/容量/未佈建磁碟機問題
        checkSupportability           檢查叢集硬體和軟體的支援性
        checkSwitchCompatibility      檢查後端 Dell 交換器,確認其版本至少為 10.5.0.6
        checkSymLink                  檢查 /var/patch/catalog 或 /var/patch/tmp 是否為符號連結,或 catalog 是否為檔案而非目錄。
        checkSyncIQ                   收集來源和目標 SyncIQ 資訊,並報告 SyncIQ 的合作夥伴。檢查大量 SyncIQ 報告檔案是否導致 tar 程序延遲其他升級程序的問題,可能使叢集長時間保留在暫時的 DU 情況下
        checkSystemFlag               檢查已設定系統標誌的磁碟集區
        checkTimeDrift                檢查節點之間的時間偏差
        checkTimeSync                 檢查叢集是否已啟用與外部伺服器同步
        checkTimeZone                 檢查目標 OneFS 程式碼層級中是否缺少時區
        checkUIDGID                   檢查 / 和 /var 中檔案的 UID/GID 值是否大於 262143
        checkUpgrade                  檢查進行中的升級相關問題。警告是否已啟用 isi_upgrade_d 服務。如果未處於已認可狀態,則會失敗。如果已有升級活動正在進行則判定失敗。檢查 fs_fmt_version,奇數或零的 fs_fmt_version 是有問題的
        checkUpgradeAgentPort         檢查 isi_upgrade_agent_d 常駐程式使用的連接埠,確保未被其他程序使用
        checkUpgradePath              檢查需要多次跳躍的情況並提供所需細節
        checkUptime                   檢查節點運作時間,超過 200 天時發出警告,標示預計運作時間
        checkVaultCard                檢查第 6 代節點是否包含 M.2 存放庫介面卡,並確認該裝置的 SMART 狀態是否仍在閾值範圍內
        checkZoneLocalAuth            對於升級至 OneFS 8.2 及更新版本,檢查本地提供者是否與其他存取區域相關聯


以下是可能有幫助的一些本主題相關建議資源:

Affected Products

PowerScale, Isilon, PowerScale OneFS, PowerScale F210, PowerScale F710

Products

Isilon
Article Properties
Article Number: 000021811
Article Type: How To
Last Modified: 14 Feb 2025
Version:  24
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.