PowerFlex:Linux SDC 内核死机(内存分配):内核:net_sched:页面分配失败

摘要: Linux SDC 失去对部分或全部卷的访问权限,或失去与内存分配相关的内核崩溃。

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状

SDC 安装在 Linux 虚拟机上,但是,该问题可能发生在物理 Linux 或安装了 SDC 的任何其他作系统上。

SDC 突然断开连接。

可能是 Linux SDC 内核死机。

SDC IO 错误。

文件系统 IO 错误。

症状

Linux 计算机上的消息文件报告 SDC 堆栈跟踪,其中包括页面分配(内存)统计信息:

 

Dec 3 10:40:50 backup7 kernel: net_sched: page allocation failure: order:4, mode:0x104020
Dec 3 10:40:50 backup7 kernel: CPU: 3 PID: 1538 Comm: net_sched Tainted: P OE ------------ 3.10.0-693.21.1.el7.x86_64 #1
Dec 3 10:40:50 backup7 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
Dec 3 10:40:50 backup7 kernel: Call Trace:
Dec 3 10:40:50 backup7 kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
Dec 3 10:40:50 backup7 kernel: [<ffffffff8118cd10>] warn_alloc_failed+0x110/0x180
Dec 3 10:40:50 backup7 kernel: [<ffffffff816aa774>] __alloc_pages_slowpath+0x6b6/0x724
Dec 3 10:40:50 backup7 kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420
Dec 3 10:40:50 backup7 kernel: [<ffffffff811d5a38>] alloc_pages_current+0x98/0x110
Dec 3 10:40:50 backup7 kernel: [<ffffffff8118bb0e>] __get_free_pages+0xe/0x40
Dec 3 10:40:50 backup7 kernel: [<ffffffff811e146e>] kmalloc_order_trace+0x2e/0xa0
Dec 3 10:40:50 backup7 kernel: [<ffffffff811e5011>] __kmalloc+0x211/0x230
Dec 3 10:40:50 backup7 kernel: [<ffffffffc0530e3e>] mapClass_AllocAndInitObj+0x3e/0x120 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc0531ca6>] mapClass_UpdateAll+0x306/0x760 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc055d54a>] ? mosMitSchedThrd_CurThrdOurs+0x6a/0xa0 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc053df93>] mapMdm_HandleObjUpdate_CK+0x2b3/0x540 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc053e290>] ? mapMdm_SendUpdateReq_CK+0x70/0xcd0 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc053e686>] mapMdm_SendUpdateReq_CK+0x466/0xcd0 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc0547a46>] ? netSock_DoIO+0xe6/0x630 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc05112f0>] ? netChan_SendReq_CK+0x70/0x800 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc0511432>] netChan_SendReq_CK+0x1b2/0x800 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc051a5fe>] netCon_SendReq_CK+0x17e/0x500 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc05158d7>] ? netRPC_SendDone_CK+0x47/0x6f0 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc05159ad>] netRPC_SendDone_CK+0x11d/0x6f0 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc055d7df>] mosMit_RunWithTLS+0x4f/0x60 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc055f0ba>] mosMitSchedThrd_ThrdEntry+0x1aa/0x510 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc055c490>] ? mosTicks_GetCurrentTick+0x20/0x20 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffffc055c4aa>] mosOsThrd_Entry+0x1a/0x40 [scini]
Dec 3 10:40:50 backup7 kernel: [<ffffffff810b4031>] kthread+0xd1/0xe0
Dec 3 10:40:50 backup7 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
Dec 3 10:40:50 backup7 kernel: [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
Dec 3 10:40:50 backup7 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
Dec 3 10:40:50 backup7 kernel: Mem-Info:
Dec 3 10:40:50 backup7 kernel: active_anon:540198 inactive_anon:192106 isolated_anon:0#012 active_file:526767 inactive_file:908890 isolated_file:0#012 unevictable:0 dirty:2548 writeback:0 unstable:0#012 slab_reclaimable:113189 slab_unreclaimable:12471#012 mapped:4048 shmem:21154 pagetables:2768 bounce:0#012 free:87384 free_pcp:669 free_cma:0
Dec 3 10:40:50 backup7 kernel: Node 0 DMA free:15900kB min:104kB low:128kB high:156kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 3 10:40:50 backup7 kernel: lowmem_reserve[]: 0 2814 9821 9821
Dec 3 10:40:50 backup7 kernel: Node 0 DMA32 free:200976kB min:19336kB low:24168kB high:29004kB active_anon:195676kB inactive_anon:266280kB active_file:292588kB inactive_file:1429216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:2884228kB mlocked:0kB dirty:1004kB writeback:0kB mapped:5056kB shmem:26680kB slab_reclaimable:405056kB slab_unreclaimable:19648kB kernel_stack:2464kB pagetables:1864kB unstable:0kB bounce:0kB free_pcp:468kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Dec 3 10:40:50 backup7 kernel: lowmem_reserve[]: 0 0 7006 7006
Dec 3 10:40:50 backup7 kernel: Node 0 Normal free:132556kB min:48136kB low:60168kB high:72204kB active_anon:1965116kB inactive_anon:502176kB active_file:1814484kB inactive_file:2206340kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:7340032kB managed:7174724kB mlocked:0kB dirty:9200kB writeback:0kB mapped:11168kB shmem:57936kB slab_reclaimable:47700kB slab_unreclaimable:30224kB kernel_stack:4960kB pagetables:9208kB unstable:0kB bounce:0kB free_pcp:2212kB local_pcp:704kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Dec 3 10:40:50 backup7 kernel: lowmem_reserve[]: 0 0 0 0
Dec 3 10:40:50 backup7 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB
Dec 3 10:40:50 backup7 kernel: Node 0 DMA32: 5802*4kB (UEM) 3223*8kB (UEM) 9329*16kB (UEM) 85*32kB (UEM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 201104kB
Dec 3 10:40:50 backup7 kernel: Node 0 Normal: 29631*4kB (UEM) 1755*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132564kB
Dec 3 10:40:50 backup7 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 3 10:40:50 backup7 kernel: 1469304 total pagecache pages
Dec 3 10:40:50 backup7 kernel: 12478 pages in swap cache
Dec 3 10:40:50 backup7 kernel: Swap cache stats: add 927451, delete 914973, find 1499563/1552563
Dec 3 10:40:50 backup7 kernel: Free swap = 3295096kB
Dec 3 10:40:50 backup7 kernel: Total swap = 4194300kB
Dec 3 10:40:50 backup7 kernel: ScaleIO R2_5 mapClass_AllocAndInitObj:1212 :Error: Failed to allocate memory 36288.Cannot process MDM response

同时或稍后(取决于工作负载),出现“NO_RESOURCES”SDC 错误和/或 SDC“IO 错误”和/或文件系统 IO 错误:

消息文件:
 

Dec  3 11:23:55 backup7 kernel: ScaleIO R2_5 mapClass_UpdateAll:523 :Error: Object ffff8802aa340000 failed to update in place.status NO_RESOURCES (67)
Dec  3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567770049] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16da.
Dec  3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec  3 11:24:45 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 2166028544
Dec  3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567770056] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16da.
Dec  3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec  3 11:24:45 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 2166028544
Dec  3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567770372] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16da.
Dec  3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec  3 11:24:45 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 2166028552
...
...
Dec  3 11:27:05 backup7 kernel: XFS (dm-2): metadata I/O error: block 0x7dec700 ("xfs_trans_read_buf_map") error 19 numblks 32
Dec  3 11:27:05 backup7 kernel: XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -19.
Dec  3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567910448] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16ac.
Dec  3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec  3 11:27:05 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 132042496
Dec  3 11:27:05 backup7 kernel: XFS (dm-2): metadata I/O error: block 0x7dec700 ("xfs_trans_read_buf_map") error 19 numblks 32
Dec  3 11:27:05 backup7 kernel: XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -19.
Dec  3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567910460] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16ac.
Dec  3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec  3 11:27:05 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 132042496

影响

SDC 无法正常工作。

SDC 断开连接。

无法访问一个或多个卷。

原因

SDC 没有足够的连续内存。

主机上内存碎片和可用内存不足。

由于 Linux 计算机的可用内存不足,并且由于内存碎片,SDC 没有足够的内存。

按照设计,SDC 使用大型卡盘进行内存分配,在此特定情况下,SDC 请求的 36k (36288) 内存无法分配:
 

Dec 3 10:40:50 backup7 kernel: ScaleIO R2_5 mapClass_AllocAndInitObj:1212 :Error: Failed to allocate memory 36288.Cannot process MDM response
 

在消息文件中:大约有 132MB 的可用内存,但是,没有足够的大块(32k、64k 等)可用于内存分配,导致内核死机:

有 29631*4kb 可用区块加上 1755*8k 可用区块 = 132MB (132564kb)。
 

Dec 3 10:40:50 backup7 kernel: Node 0 Normal: 29631*4kB (UEM) 1755*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132564kB

提醒:这不太可能发生在可用内存很少的机器上。

解决方案

提醒:主机重新启动将临时清除内存碎片,直至下次出现问题。

从 SDC 端来看,没有解决方法,因为行为是设计使然。

从主机端:

1) 添加更多内存并确保可用内存保持在足够高的水平。

2) 在此特定情况下,SDC Linux 计算机是虚拟机,将 SDC 移至 ESXi 将解决问题,因为 ESXi 主机几乎没有 GB 的可用内存。

3) 验证运行的应用程序/服务是否可能导致或促成内存碎片。

其他信息

受影响的版本

任何 SIO 版本。

已在版本中修复

N/A

受影响的产品

PowerFlex Software, VxFlex Product Family, VxFlex Ready Node
文章属性
文章编号: 000056228
文章类型: Solution
上次修改时间: 31 10月 2025
版本:  5
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。