PowerFlex:设备 ID 错误的故障磁盘

Summary: 在另一个 SDS 节点上使用 ScaleIO 系统磁盘时,该磁盘显示为故障。

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

情况说明

当客户为两个或更多 SDS 节点使用同一磁盘存储模块时,他们必须为每个磁盘及其所属节点配置离线/联机。

有时,客户可能会误会并将两个 SDS 节点上的同一个磁盘联机,然后我们看到一个磁盘出现故障。

可能的错误:

  • 客户在两个 SDS 节点上将同一磁盘联机,然后我们看到一个磁盘出现故障。
  • 客户交换磁盘,这意味着每个磁盘在错误的节点上都处于联机状态,在这种情况下,我们会在每个 SDS 节点上看到两个发生故障的驱动器。

 

症状

SDS 节点找到了错误的磁盘设备 ID,SDS 进程将磁盘设置为 FAILED 状态。

在 mosConf 部分 SDS 进程移至物理设备发现后 SDS 进程启动时,当磁盘用于 scaleio(如系统作系统磁盘或可用磁盘)时,将显示“无效的设备标头签名”错误(输出中的第一行)。当 ScaleIO 正在使用磁盘时,会找到一个设备,并在其旁边显示设备 ID。

在下面的第一个输出(来自服务器 1 的 trc 文件)中,我们可以看到找到了 12 个设备,但仔细观察,我们可以看到两个设备是不同的(L、M) - 设备 ID 中的第 12 个字符是 3,而不是 0 作为所有其他设备 ID。

在下面的第二个输出(来自服务器 2 的 trc 文件)中,找到了 12 个磁盘,并且两个磁盘不同 (K,L) - 设备 ID 中的第 12 个字符是 0,而不是所有其他设备 ID 的 3。

在发现过程 SDS 过程转移到将设备添加回 SDS 时,当 SDS 找不到磁盘时,rc 结果将NOT_FOUND(来自服务器 1 的 trc 文件),正如我们在下面的示例中看到的那样,在每个 SDS 上,我们的磁盘的设备 ID 不属于其 SDS,SDS 会将这些磁盘显示为 FAILED,因为它们已NOT_FOUND(来自服务器 1 的 trc 文件)。

 

来自服务器 1 的 TRC 文件

30/04 09:48:16.328000 000000A170629EA0:phyDev_ReadDevId:02679: Invalid device header signature : path=C, devVersion=2807280628052804, sigStart=2803280228012800, sigEnd=283b283a28392838
30/04 09:48:16.328000 000000A170629EA0:phyDevMap_ReloadSpecific:00128: Failed to read DeviceId of C. rc=351
30/04 09:48:16.329000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device F ,a2901dcd00000000
30/04 09:48:16.330000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device G ,a2901dce00000001
30/04 09:48:16.331000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device H ,a2901dcf00000002
30/04 09:48:16.332000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device I ,a2901dd000000003
30/04 09:48:16.333000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device J ,a2901dd100000004
30/04 09:48:16.333000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device K ,a29044bf00000005
30/04 09:48:16.337000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device L ,a29044c400030006
30/04 09:48:16.342000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device M ,a29044c000030005
30/04 09:48:16.343000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device N ,a29044cb00000008
30/04 09:48:16.344000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device O ,a2906bcf00000009
30/04 09:48:16.345000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device P ,a2906bd30000000a
30/04 09:48:16.345000 000000A170629EA0:phyDevMap_ReloadSpecific:00136: Found device Q ,fbd792df0000000b
...
30/04 09:48:16.345000 000000A1730BCEA0:contCmd_AddDev:01204: DevId a2901dce00000001 - Start rc = SUCCESS
30/04 09:48:16.346000 000000A173086EA0:contCmd_AddDev:01204: DevId a29044c700000007 - Start rc = SUCCESS
30/04 09:48:16.346000 000000A173098EA0:contCmd_AddDev:01204: DevId a2906bd30000000a - Start rc = SUCCESS
30/04 09:48:16.346000 000000A1730E0EA0:contCmd_AddDev:01204: DevId fbd792e50000000c - Start rc = SUCCESS
30/04 09:48:16.346000 000000A1730B3EA0:contCmd_AddDev:01204: DevId a2901dcf00000002 - Start rc = SUCCESS
30/04 09:48:16.346000 000000A17310DEA0:contCmd_AddDev:01204: DevId a2901dcd00000000 - Start rc = SUCCESS
30/04 09:48:16.346000 000000A173062EA0:contCmd_AddDev:01204: DevId a29044cb00000008 - Start rc = SUCCESS
30/04 09:48:16.346000 000000A1730C5EA0:contCmd_AddDev:01204: DevId a2901dd100000004 - Start rc = SUCCESS
30/04 09:48:16.346000 000000A1730E0EA0:contCmd_AddDev:01391: DevId fbd792e50000000c - Done rc = NOT_FOUND
30/04 09:48:16.348000 000000A1730A1EA0:contCmd_AddDev:01204: DevId fbd792ee0000000e - Start rc = SUCCESS
30/04 09:48:16.348000 000000A1730A1EA0:contCmd_AddDev:01391: DevId fbd792ee0000000e - Done rc = NOT_FOUND
30/04 09:48:16.349000 000000A1730F2EA0:contCmd_AddDev:01204: DevId fbd792e90000000d - Start rc = SUCCESS
30/04 09:48:16.349000 000000A17306BEA0:contCmd_AddDev:01204: DevId a2901dd000000003 - Start rc = SUCCESS
30/04 09:48:16.349000 000000A17307DEA0:contCmd_AddDev:01204: DevId a2906bcf00000009 - Start rc = SUCCESS
30/04 09:48:16.349000 000000A173074EA0:contCmd_AddDev:01204: DevId a29044bf00000005 - Start rc = SUCCESS
30/04 09:48:16.349000 000000A173086EA0:contCmd_AddDev:01391: DevId a29044c700000007 - Done rc = NOT_FOUND
30/04 09:48:16.349000 000000A1730F2EA0:contCmd_AddDev:01391: DevId fbd792e90000000d - Done rc = NOT_FOUND
30/04 09:48:16.351000 000000A1730FBEA0:contCmd_AddDev:01204: DevId fbd792ef0000000f - Start rc = SUCCESS
30/04 09:48:16.352000 000000A1730FBEA0:contCmd_AddDev:01391: DevId fbd792ef0000000f - Done rc = NOT_FOUND
30/04 09:48:16.352000 000000A173104EA0:contCmd_AddDev:01391: DevId a29044c300000006 - Done rc = NOT_FOUND

来自服务器 2 的 TRC 文件

30/04 11:37:57.065000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device F ,a2901dc800030000
30/04 11:37:57.065000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device G ,a2901dc900030001
30/04 11:37:57.065000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device H ,a2901dca00030002
30/04 11:37:57.065000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device I ,a2901dcb00030003
30/04 11:37:57.065000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device J ,a2901dcc00030004
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device K ,a29044c300000006
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device L ,a29044c700000007
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device M ,a29044c800030007
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device N ,a29044cc00030008
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device O ,a2906bd000030009
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device P ,a2906bd40003000a
30/04 11:37:57.081000 000000EE1DC2AEA0:phyDevMap_ReloadSpecific:00136: Found device Q ,fbda92e00003000b

 

SDS 设备 ID 说明

每个 SDS 设备在其第 64 LB 上保存一个标头。

标头具有以下结构:

                64 位签名

                64 位设备版本

                64 位 SDS ID

                64 位 SDS 设备 ID ß 您要查找的内容。

SDS 设备 ID(也称为 TgtDevId)由以下内容组成:

唯一 ID 32 位

TGT 索引 16 位

设备索引 16 位

 

例如:ID 为 2df4737600000002 的 SDS 将具有两个 ID 为:7fff29ea00020000, 7fff29eb00020001

无论如何,如果属于 SDS x 的设备交换到 SDS y,则在将该设备重新连接到 SDS y 时,它会通过检查标头上保存的 SDS ID 来发现它属于另一个 SDS。

如果您搜索“Wrong device”,也许可以在 SDS 日志中看到它

 

影响

当磁盘处于故障状态时,系统重建和重新平衡。

Cause

磁盘设备 ID 属于另一个 SDS 节点,因此 ScaleIO 永远不会使用它。

 

Resolution

将磁盘添加到正确的 SDS 节点。

受影响的版本

所有 PowerFlex 版本

已修复问题的版本

这是正常现象。

Affected Products

VxFlex Product Family

Products

VxFlex Product Family
Article Properties
Article Number: 000048300
Article Type: Solution
Last Modified: 07 Jul 2025
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.