PowerScale:Isilon:CloudPools作导致isi_cpool_d CPU 利用率过高
Yhteenveto: isi_cpool_d过程可能会导致 PowerScale Isilon 群集上的 CPU 利用率较高。
Oireet
isi_cpool_d进程显示群集上的 CPU 利用率持续较高。
Isilon-1# top -n 10 PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 87857 root 124 20 0 595M 173M nanslp 13 1722.5 857.62% isi_cpool_d 3455 root 29 98 r150 397M 86M sigwai 10 4216.2 62.55% nfs 3313 root 40 98 r150 1018M 683M sigwai 14 7402.9 47.71% lwio 94259 root 13 52 0 566M 491M usem 18 374.1H 32.57% isi_celog_monitor 18378 root 5 20 0 102M 53M uwait 3 49:57 24.56% isi_job_d 34552 root 1 52 0 37M 15M adv 22 112.6H 20.51% isi_migr_sched 3144 root 13 20 0 52M 13M select 8 2009.5 15.33% isi_audit_d 98432 root 1 52 0 105M 66M kqread 26 417:47 14.55% isi_celog_analysis 3213 root 26 52 0 96M 28M uwait 10 1109.2 12.50% isi_avscan_d 51167 root 5 20 0 93M 42M uwait 21 74:37 10.40% isi_job_d ... ..
群集上可能正在运行多个 CloudPools 作业,但即使所有作业都暂停,isi_cpool_d利用率仍然很高。
Isilon-1# isi cloud jobs list ID Description Effective State Type --------------------------------------------------------------------------------------- 1 Write updated data to the cloud paused cache-writeback 2 Expire CloudPools cache paused cache-invalidation 4 Clean up unreferenced data in the cloud paused cloud-garbage-collection 5 Write updated snapshot data to the cloud paused snapshot-writeback 6 Update SmartLink file formats paused smartlink-upgrade 7 Add data to CloudPools cache paused cache-pre-populate 959 paused archive 960 paused archive 961 paused archive 962 paused archive 964 paused archive 965 paused archive 966 paused archive 967 paused archive 968 paused archive ---------------------------------------------------------------------------------------
Isilon-1# top -n 5 PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 87857 root 124 20 0 588M 180M nanslp 4 1723.5 805.81% isi_cpool_d 3455 root 28 98 r150 397M 87M sigwai 10 4216.3 69.34% nfs 18378 root 6 20 0 122M 72M uwait 9 53:18 68.36% isi_job_d 3313 root 49 98 r150 1019M 684M sigwai 14 7403.0 66.16% lwio 51167 root 6 20 0 94M 42M uwait 26 76:02 22.36% isi_job_d ...
Syy
某些作(如高速缓存写回和高速缓存失效)在后台进行,与任何正在运行的 CloudPools 作业不直接相关。暂停 CloudPools 作业不会阻止这些作运行。这些线程会继续运行并导致 CPU 利用率过高。
要确认这一点,请在监视 CPU 利用率时暂停 cache-writeback 和 cache-invalidation作。isi_cpool_d暂停后,CPU 利用率应该会迅速下降。Isi_cpool_d恢复作后,CPU 利用率会攀升。
要暂停 CloudPools作,请执行以下作:
# isi cloud jobs pause cache-writeback # isi cloud jobs pause cache-invalidation
要恢复 CloudPools作,请执行以下作:
# isi cloud jobs resume cache-invalidation # isi cloud jobs resume cache-writeback
Tarkkuus
不建议将高速缓存回写和高速缓存失效作暂停较长时间。各种未完成的任务和作会累积并放大问题。
由回写或高速缓存失效导致的高 CPU 利用率可能表示发生了大量高速缓存。通常是因为大量数据被归档和内联调回。这可能是由于文件池策略中的归档条件编写不当所致。在不考虑访问时间的情况下执行存档可能会导致过度缓存活动文件。
这是将数据归档到 ECS CloudPools 的编写不当的文件池策略的示例。请注意,指定路径内的任何数据都会立即归档到 CloudPools:
--------------------------------------------------------------------------------
Name: Bad ECS Cloud Policy
Description: Tier to ECS
CloudPools State: OK
CloudPools Details:
Apply Order: 3
File Matching Pattern: Path == APPS/SeaShoreVideo (begins with)
OR
Path == APPS/OceanArchive (begins with)
Set Requested Protection: -
Data Access Pattern: -
Enable Coalescer: -
Enable Packing: -
Data Storage Target: -
Data SSD Strategy: -
Snapshot Storage Target: -
Snapshot SSD Strategy: -
Cloud Pool: EMC ECS Pool
Cloud Compression Enabled: Yes
Cloud Encryption Enabled: No
Cloud Data Retention: 1W
Cloud Incremental Backup Retention: 5Y
Cloud Full Backup Retention: 5Y
Cloud Accessibility: cached
Cloud Read Ahead: partial
Cloud Cache Expiration: 1D
Cloud Writeback Frequency: 9H
ID: Good ECS Cloud Policy
--------------------------------------------------------------------------------
这是正确编写的文件池策略的示例,该策略可容纳活动文件和最近访问的文件。请注意,此策略包含访问时间条件,因此只有 5 周零 5 天后未访问的数据才会归档到 CloudPools。
--------------------------------------------------------------------------------
Name: Good ECS Cloud Policy
Description: Tier to ECS
CloudPools State: OK
CloudPools Details:
Apply Order: 3
File Matching Pattern: Accessed Time > 5W5D AND Path == APPS/SeaShoreVideo (begins with)
OR
Accessed Time > 5W5D AND Path == APPS/OceanArchive (begins with)
Set Requested Protection: -
Data Access Pattern: -
Enable Coalescer: -
Enable Packing: -
Data Storage Target: -
Data SSD Strategy: -
Snapshot Storage Target: -
Snapshot SSD Strategy: -
Cloud Pool: EMC ECS Pool
Cloud Compression Enabled: Yes
Cloud Encryption Enabled: No
Cloud Data Retention: 1W
Cloud Incremental Backup Retention: 5Y
Cloud Full Backup Retention: 5Y
Cloud Accessibility: cached
Cloud Read Ahead: partial
Cloud Cache Expiration: 1D
Cloud Writeback Frequency: 9H
ID: Bad ECS Cloud Policy
--------------------------------------------------------------------------------
isi_cpool_d CPU 利用率高的其他原因可能会有所不同,具体取决于群集配置、设置和代码级别。如果需要帮助,请联系戴尔技术支持。