PowerScale:如何執行叢集上分析工具
Summary: 如何使用 Isilon 叢集上分析工具 (IOCA) 及解釋結果的說明。
Instructions
Isilon 叢集上分析 (IOCA) 工具可分析執行中 PowerScale 叢集的健全狀況,並協助* 規劃升級。
* IOCA 的設計目的並非讓您在準備升級活動時,取代檢閱升級相關的說明文件。請參閱 OneFS 升級規劃和程序指南:
最新版本的 IOCA 工具可從 Lightning 下載。
注意:即使下載日期較舊,系統仍會下載最新版本。
注意:下載並傳輸至叢集後,您必須使用 解壓縮 IOCA 和 IOCA.sha256
# tar -xvf IOCA.tar IOCA IOCA.sha256
若要驗證 IOCA 的完整性,您可以使用上述 tar 套件中的 sha256 檔案,或者可以從此處下載獨立的 IOCA.sha256 檔案。
下載後,將其傳輸到與 IOCA 指令碼相同的位置(確保覆蓋現有的 IOCA.sha256)。
如果使用 sha256sum:
# sha256sum -c /home/nyhanj1/IOCA.sha256 ./IOCA: OK
觀察以上粗體字的結果,並確認其正常
如果您沒有 sha256sum:
# cat IOCA.sha256 a55c9efcea29776317d3b3ed36c504dcab08d1f945161f6ac6c8bbb315f31bb0 ./IOCA
# sha256 IOCA SHA256 (IOCA) = a55c9efcea29776317d3b3ed36c504dcab08d1f945161f6ac6c8bbb315f31bb0
手動驗證兩個檢查總和相符結果。
若要在叢集上執行 IOCA:
- 確認您已連線至正確的叢集。執行下列命令以顯示叢集中的序號,確認服務要求的序號顯示在清單中:
isi_for_array cat /etc/isilon_serial_number
- 如果目錄不存在,請執行以下命令以建立 /ifs/data/Isilon_Support/ directory 目錄,並進入其中:
mkdir -pv /ifs/data/Isilon_Support
cd /ifs/data/Isilon_Support
-
將最新版本的 IOCA 暫存到叢集中的 /ifs/data/Isilon_Support/ 目錄內:
- 如果存在先前的版本,請執行下列命令以確認其是否為最新版本,並將其與可供下載的版本進行比較。版本會列在執行狀況檢查指令檔的頂端。
perl IOCA -v執行 IOCA 工具,新增任何其他引數。在下列範例中,升級前檢查包括升級至 9.5.1.0 的檢查,並在取得執行狀況檢查結果後顯示升級計畫建議:
perl IOCA -u 9.5.1.0
- 收集輸出結果,並張貼到服務要求。
- 檢閱所有已識別「故障」或「警告」訊息的執行狀況檢查,以找出潛在問題。
解譯 IOCA 輸出:
對於每個執行狀況檢查項目,如果發現任何情況,通常都有相關的知識庫 (KB) 文章。此工具原本是一種內部專用工具,它所顯示的文章有一部份尚未提供外部存取。目前正在促進外部使用者對這些參考文章的存取。
以下是叢集上 BMC 或 CMC 硬體監控檢查失敗的輸出範例:
BMC/CMC Hardware Monitoring FAIL
FAIL: Hardware monitoring issues detected on nodes: 2
INFO: 3 nodes have out of date CMC firmware versions: 1-3
INFO: Refer to KB489050 (https://support.emc.com/kb/489050) for details.
在此範例中顯示了「FAIL」,並表示節點 2 具有硬體監視問題。其中亦包含「INFO」,顯示節點 1 至 3 的 CMC 韌體版本已過時。最後,它包含一個 KB 文章,其中包含解決步驟。
相容模式
在啟用相容模式的叢集上,會以在不相容模式叢集上的相同方式執行 IOCA。不需要使用 sudo 執行。但是,compadmin 使用者必須是 IOCA 指令檔的擁有者才能執行。
以下是 IOCA 和可用引數或篩選的使用方法:
Usage: IOCA [options] [destination OneFS version] -d, --debug Display debugging information -e, --extra Displays extra details as part of each check -j, --json Displays output in JSON format -v, --version Displays current script version -h, --help Displays this help screen -r <checkName>, --run=<checkName> Executes only the specified check, can be included multiple times -u, --upgradeplan Includes an upgrade plan after health checks --rolling Provide rolling reboot plans --parallel Provide parallel reboot plans [where supported] --simultaneous Provide simultaneous reboot plans [excludes node firmware] -o, --onefs Supports the following comma separated options [ex. 8.1.2,simultaneous]: <version> Uses the provided destination OneFS version simultaneous Simultaneous OneFS upgrade parallel Parallel OneFS upgrade [requires 8.2.2+] rolling Rolling OneFS upgrade exclude-nf Upgrade plans will combine OneFS + node firmware by default [9.2 feature], this option disables that -p, --patches Supports the following comma separated options [ex. none,simultaneous]: none Opt out of patch recommendations simultaneous Simultaneous patch installs parallel Parallel patch installs [requires 9.1+] rolling Rolling patch installs -nf, --node-firmware Supports the following comma separated options [ex. 10.3.3,parallel]: <version> Uses the provided version for node firmware checks none Opt out of node firmware recommendations simultaneous Simultaneous node firmware updates [requires 8.2+] parallel Parallel node firmware updates [requires 8.2+] rolling Rolling node firmware updates -df, --drive-firmware Supports the following comma separated options [ex. 1.32]: <version> Uses the provided version for drive firmware checks none Opt out of drive firmware recommendations -vf, --verify-files Runs checks on files within specified location [ex. /ifs/data/] of certain file type [ex. .isi, .tgz, .tar.gz, .tar] <file location> default location is /ifs/data/ - specify the location where the upgrade files were placed
Additional Information
執行此命令時所使用之個別檢查的表格和名稱:
--run=CHECK
注意:IOCA 指令檔會經常更新。如果您想要查看新的完整檢查清單,請務必在 Isilon Cluster 上更新至最新 IOCA 版本,並執行以下命令以取得完整清單。
perl
onefs94-a-1# perl IOCA --run=CHECK
Isilon 叢集上分析 0.1541
要求的檢查,檢查,無法辨識。
可提供的檢查包括:
checkA100Root 檢查 A100 節點根鏡像是否需要調整為 2GB
checkAPIAuth 檢查從 8.1.2.0 升級至 9.2 或更新版本時,API 驗證是否設定為基本驗證
checkAccessZones 檢查跨 7.1.1 版本升級時是否設定多個存取區域。檢查巢狀或重疊的 SMB 共用。在跨 OneFS 7.1.1 版本升級時,當發現任何非系統存取區域集區時發出警告
checkAggregationMode 檢查升級至 OneFS 8+ 的彙總模式是否不是舊版 FEC 模式
checkAspera 檢查是否啟用任何 Aspera 服務。若執行 OneFS 升級,必須在升級後重新安裝
checkAuthStatus 檢查每個節點的驗證狀態。在任何驗證提供者未連線或處於使用中狀態發出警告。 檢查 RFC2307 和 GID/UID 的自動定位,並指向知識庫文章 KB 000028577
checkBBUDegCap 檢查 Gen6 節點上的 BBU 降級程度,並標示任何可能增加節點進入唯讀狀態風險的過度降級節點。
checkBMCandCMC 檢查 BMC/CMC 相關問題
checkBXENodes 檢查是否存在具有 BXE 介面的節點,同時檢查知識庫文章 KB 000048172 和 KB 000064027 中的已知問題
checkBootDisks 檢查開機磁碟剩餘使用壽命、韌體等級和歷史錯誤計數
checkCM6FWBug 檢查磁碟機韌體版本是否符合 FCO F022318EE 的標準
checkCMOSTimeCentury 檢查 CMOS 時間中設定的世紀是否與目前的世紀相符
checkCapacity 根據升級規劃和進度指南中記錄的數字驗證叢集容量。接近限制時發出警告
checkCloudPools 檢查 CloudPools 相關問題
checkConfCmtSyntax 檢查 sysctl.conf 中沒有前導 # 符號的註解,這可能在解析設定檔時造成問題。
checkContact Displays contact information configured in CELog when run with the --extra argument
checkCoreDumps Checks for recent unexpected process restarts reported in /var/log/messages
checkDTA000194434 Checks for criteria of KB 000194434
checkDestinationOneFS Checks destination OneFS version
checkDiskpools Checks diskpools and class equivalence for OneFS upgrades going across 7.0
checkDriveFirmware Checks for out of date Drive Firmware and calls other related drive firmware checks
checkDriveLoad Checks the current load on the drives
checkDriveStallTimeout Checks current Drive Stall Timeout setting, recommend value is 3.5 seconds (3500000 microseconds) or higher
checkDriveSupportPackage Checks for drive firmware updates available in the Drive Support Package
checkDrivesHealth Checks health of drives and the drive stall timeout setting in sysctl
checkET004252 Checks for criteria of ET004252
checkETAs Checks for Technical Advisories
checkEmailSettings Displays E-mail settings configured in CELog when run with the --extra argument
checkEncoding Checks exports and cluster configuration for if utf-8/default encoding
checkEvents Checks events on all the nodes, failure if any critical events exist
checkFCOF022318EE Checks drive firmware versions for the criteria of FCO F022318EE
checkFCOF031617FC Checks drive firmware versions for the criteria of KB 000024620
checkFCOF042415EE Checks the cluster to see if it meets criteria for FCO F042415EE/KB 000051631
checkFileSharing Checks if Atime is enabled
checkFilepoolPolicies Checks GNA requirements and checks filepools for final match being set and names starting with a number
checkFirmwarePackages In OneFS 9.1 and later, confirms firmware packages are available
checkFlush Checks for running flush processes / active pre_flush screen sessions on clusters
checkGatewayPriority Checks for subnets with duplicate gateway priorities
checkGroups Checks nodes for all enabled protocols. Fails if group info is reporting that an enabled protocol is not functioning on any node
checkHDFS Display HDFS details, only useful when run with --extra
checkHardening Checks if FIPS is enabled on node in the cluster, this needs to be disabled prior to upgrades to 9.5 or higher and re enabled after to avoid assessment failures
checkHardwareStatus Checks battery health, power supplies, and gathers hardware details for use elsewhere
checkHardwareUpgrade Checks if there is an in progress hardware upgrade
checkHealth Verifies cluster health status and node health status
checkIBInterfaces Checks for ib0/1 as being active, checks for ETA180317 IB switch firmware versions, and checks for overlapping IB networks
checkIBPCIeSlot Checks if the InfiniBand card is installed in the wrong slot which may lead node start up issues during an upgrade to OneFS 9 and later releases
checkIDI Checks for IDI errors in the past 90 days
checkISCSI Checks for iSCSI LUNs being configured in /ifs/.ifsvar/iscsi/iscsi.conf (OneFS prior to 8.x only)
checkIndexSnapshotCurrent Checks for current snapshots that are over 2 weeks old and may contirbute to capacity issues
checkInternalPing Checks internal network by performing network ping operations
checkJobHistory Checks job history for issues, currently just MediaScan issues
checkJobStatus Checks for running jobs that would impact an upgrade
checkJobs Checks jobs
checkKB000066019 Checks size of reports.db and flags if over 100MB which may lead to issues outlined on KB 000066019
checkKB000081658 Checks for criteria of KB 000081658
checkKB000181818 Checks for criteria of KB 000181818
checkKB000192800 Checks for critera of KB 000192800
checkKB000196175 Checks for criteria of KB 000196175
checkKB000196762 Checks for criteria of KB 000196762
checkKB000197850 Checks for issues with IB queue pairs that would lead to node reboot issues if IB queue pairs are in a degraded state
checkKB000212387 Checks Authentication providers msDS-SupportedEncryptionTypes attribute to ensure a value is set and assigned, if it is not, there is potential for DU after an upgrade to 9.5 or above.
checkKB000213188 檢查當前版本低於 9.2 且目標版本為 9.5 或更高版本時的 SED 硬體。
checkKB201488 Checks if any node meets criteria for KB 000201488
checkKB201666 Checks if it is necessary to perform the proactive workaround from KB 000201666 for a patch installation and whether the pre-requisites are met
checkKB201933 Checks for criteria of KB 000201933
checkKB203381 Checks for criteria of KB 203381
checkKB220014 Checks for criteria of KB 220014
checkKB462202 Checks BootOrder in bios_settings.ini on Generation 5 nodes to determine if at risk for KB 000025523
checkKB489473 Checks if any node meets criteria for KB 000061983
checkKB490849 Checks if at risk for KB 000052089
checkKB496582 Checks for auth rules issues detailed in KB 000160596
checkKB496993 Checks if the cluster is at risk for KB 000061504
checkKB501267 Checks for the criteria of KB 000026510
checkKB507031 Checks for criteria outlined in KB 000035398
checkKB516613 Checks if any node meets criteria for KB 000057267
checkKB519119 Checks if nodes may be impacted by KB 519119
checkKB519388 Pre-upgrade check for issues outlined in KB 000162270
checkKB519423 Checks if the cluster config files are in a mixed mode
checkKB519890 Checks for a known issue with LAGG interfaces in LACP mode when running OneFS 8.0.0.6, 8.0.1.2, 8.1.0.2, and 8.1.1.1
checkKB521778 Checks for criteria outlined in KB 000031948
checkKB521890 Checks for criteria outlined in KB 000167681
checkKB524082 Checks if the cluster is enabled for HTTP clients and flags a compatibility issue caused by a change in Apache versions
checkKB527312 Check for criteria of KB 000166965
checkKB530050 Check for criteria of KB 000040987
checkKB533516 Checks if cluster uses an IP for AWS CloudPool accounts putting it at risk for DTA 533516
checkKB535582 Checks if at risk for KB 000060471
checkKB537785 Check for criteria of KB 000168829
checkKB540000 Checks for criteria of KB 000058599
checkKB540071 Checks for existence of files under /var/fw/fwpkg when no IsiFw package is installed
checkKB540513 Checks for criteria of KB 000174074
checkKB540872 Checks if the cluster may encounter KB 000170982 during an upgrade from OneFS 8.2 releases
checkKB540901 Checks boot disk partitions for any mismatches in uuids which may lead to boot failures
checkKB544401 Check for critera of KB 000173157
checkKB544854 Check for criteria of KB 000173432
checkKB546604 Checks for criteria of KB 000180866
checkKerberos8000 Checks for an issue with the Kerberos configuration file when upgrading to OneFS 8.0.0.0
checkLACPSFP Checks for LACP on cxgb interfaces for KB 000174095
checkLWIODLog Checks /var/log/lwiod.log for known errors occuring in the last 30 days
checkLastZoneID8000 Checks for gaps in access zones that may cause major issues when upgrading to OneFS 8.0.0.0
checkLeakFreeBlocks Checks for nodes with efs\.lbm\.leak_freed_blocks enabled.
checkLegacyLDAP 檢查在 OneFS 6 升級至 OneFS 7 時是否啟用舊版 LDAP
checkLicense 檢查授權並根據授權功能提供指示。 InsightIQ 和 vCenter 授權,提供相容性指南中的資訊。 iSCSI, instructs to only perform simultaneous OneFS upgrades and that it is not supported in 8.0
checkLinMasterPadding Checks the LIN master padding to be all zeros
checkListenQueue Checks for listen queue overflows to be less than 50,000 per node
checkLogLevel Checks LWSM log levels for NFS, SMB, HDFS, and Authentication
checkLogs Checks Log file presence and flags if any log file specified in list is not present
checkMaintenanceMode Checks if the cluster is currently in maintenance mode
checkMemory Checks each DIMM to meet criteria outlined in KB 000041666 and if the expected (per product info line) matches closely the reported RAM
checkMessagesLog Checks /var/log/messages.log for known errors occuring in the last 30 days
checkMirrors Checks the boot mirror health
checkNDMP Checks for running NDMP sessions
checkNDMP16GB Checks for LNN changes that have occurred since the isi_ndmp_d processes started which can cause issues during the HookDataMigrationUpgrade phase of an OneFS upgrade
checkNDMPUpgradeTimeout Checks for LNN changes that have occurred since the isi_ndmp_d processes started which can cause issues during the HookDataMigrationUpgrade phase of an OneFS upgrade
checkNFS Uses nfsstat to identify RPC errors
checkNetBIOS Checks if the Isilon NetBIOS Name Service (nbns) is enabled when updating to OneFS 8.0.1 and later
checkNetstat Checks connections counts for specific protocols via netstat
checkNetworkParallelUpgrade Checks for the risk of inaccessible network pools during a parallel upgrade
checkNetworkPoolIFaces Checks each network pool and their assigned interfaces, if only 1 interface is configured for any pool and IP Ranges are set, this will cause a failure for pre upgrade mandatory checks
checkNodeCompatibility Checks node compatibility for OneFS upgrades by comparing it against known supported versions
checkNodeFirmware Checks node firmware for updates
checkNodesInstalled Checks for nodes installed to display in an agregated list for visibility
checkOneFSVersions Checks running version and target version for any issues. 失敗:檢查所有節點間的任何版本不符
checkOpenFiles 檢查每個節點的開啟檔案數量 [sysctl kern.openfiles],並與最大開啟檔案數 [sysctl kern.maxfiles] 比較。 當超過最大值的 80% 時發出警告,超過 90% 時判定失敗
checkPSCALE136276 檢查 PSCALE-136276 的標準
checkPartitions 檢查系統分割區空間
checkPatches 檢查目前的版本 (如果沒有目標版本) 或目標版本的強烈建議修補程式
checkPerformance 檢查叢集效能
checkProcesses 檢查 OpenSM 主節點、MCP、isi_mca_dump 和 isi_upgrade_d 程序相關問題
checkProtectionLevel 檢查儲存集區保護等級
checkRealACL 檢查是否在 /ifs/.ifsvar 或 ifs/.ifsvar/patch 上設定真實 ACL。這些不應設定,如果設定了會造成升級/安裝問題
checkRemoteSupport 檢查是否啟用受限制的 Shell 和 isi_supportassist,如果兩者都啟用且升級至 9.7,將會造成支援協助服務重新啟動的問題
checkRoutingTables 顯示每個節點的路由表
checkSBR8000 檢查針對 8.0.0.0/1 的 OneFS 升級時是否在升級前啟用 SBR
checkSNMPDConfig 檢查 SNMPD.config 和 isilon_serial_number 確保它們不是 0 位元組
checkSPNs 顯示 SPN 清單,僅在使用 --extra 選項時有用
checkSRS 檢查遠端連線設定的問題
checkSSHDConfig 檢查 /etc/mcp/templates/sshd_config 檔案中的已知問題
checkSWIFTAccounts 檢查 SWIFT 帳戶,用於在授權 SWIFT 且執行升級至 9.5 或更高版本時設定標示優先順序
checkServices 檢查常見服務以確保它們處於預期狀態
checkServicesMonitoring 檢查已啟用的服務是否受到監控
checkSmartConnect 檢查 SmartConnect 服務 IP 是否全部已指派且未用於用戶端連線
checkSnapshot 檢查快照計數是否接近叢集限制 20,000,自動刪除是否設為是,並檢查快照記錄。檢查快照記錄中的 EIN/EIO/EDEADLK/建立快照失敗
checkStaticRouteConflict 檢查靜態路由衝突
checkStoragePools 檢查儲存集區的健康狀態/容量/未佈建磁碟機問題
checkSupportability 檢查叢集硬體和軟體的支援性
checkSwitchCompatibility 檢查後端 Dell 交換器,確認其版本至少為 10.5.0.6
checkSymLink 檢查 /var/patch/catalog 或 /var/patch/tmp 是否為符號連結,或 catalog 是否為檔案而非目錄。
checkSyncIQ 收集來源和目標 SyncIQ 資訊,並報告 SyncIQ 的合作夥伴。檢查大量 SyncIQ 報告檔案是否導致 tar 程序延遲其他升級程序的問題,可能使叢集長時間保留在暫時的 DU 情況下
checkSystemFlag 檢查已設定系統標誌的磁碟集區
checkTimeDrift 檢查節點之間的時間偏差
checkTimeSync 檢查叢集是否已啟用與外部伺服器同步
checkTimeZone 檢查目標 OneFS 程式碼層級中是否缺少時區
checkUIDGID 檢查 / 和 /var 中檔案的 UID/GID 值是否大於 262143
checkUpgrade 檢查進行中的升級相關問題。警告是否已啟用 isi_upgrade_d 服務。如果未處於已認可狀態,則會失敗。如果已有升級活動正在進行則判定失敗。檢查 fs_fmt_version,奇數或零的 fs_fmt_version 是有問題的
checkUpgradeAgentPort 檢查 isi_upgrade_agent_d 常駐程式使用的連接埠,確保未被其他程序使用
checkUpgradePath 檢查需要多次跳躍的情況並提供所需細節
checkUptime 檢查節點運作時間,超過 200 天時發出警告,標示預計運作時間
checkVaultCard 檢查第 6 代節點是否包含 M.2 存放庫介面卡,並確認該裝置的 SMART 狀態是否仍在閾值範圍內
checkZoneLocalAuth 對於升級至 OneFS 8.2 及更新版本,檢查本地提供者是否與其他存取區域相關聯
以下是可能有幫助的一些本主題相關建議資源: