PowerFlex: Pane do kernel do Linux SDC (alocação de memória): kernel: net_sched: falha na alocação de página
Summary: O Linux SDC perde o acesso a alguns ou a todos os volumes ou ocorre uma pane no kernel relacionada à alocação de memória.
Symptoms
O SDC está instalado na VM Linux, no entanto, o problema pode ocorrer no Linux físico ou em qualquer outro sistema operacional com o SDC instalado.
O SDC foi desconectado repentinamente.
Possivelmente, o kernel do Linux SDC entra em pane.
Erros de E/S do SDC.
Erros de E/S do file system.
Sintomas
O arquivo de mensagens na máquina Linux relata um rastreamento de pilha SDC, que inclui estatísticas de alocação de página (memória):
Dec 3 10:40:50 backup7 kernel: net_sched: page allocation failure: order:4, mode:0x104020 Dec 3 10:40:50 backup7 kernel: CPU: 3 PID: 1538 Comm: net_sched Tainted: P OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 Dec 3 10:40:50 backup7 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 Dec 3 10:40:50 backup7 kernel: Call Trace: Dec 3 10:40:50 backup7 kernel: [<ffffffff816ae7c8>] dump_stack+0x19/0x1b Dec 3 10:40:50 backup7 kernel: [<ffffffff8118cd10>] warn_alloc_failed+0x110/0x180 Dec 3 10:40:50 backup7 kernel: [<ffffffff816aa774>] __alloc_pages_slowpath+0x6b6/0x724 Dec 3 10:40:50 backup7 kernel: [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420 Dec 3 10:40:50 backup7 kernel: [<ffffffff811d5a38>] alloc_pages_current+0x98/0x110 Dec 3 10:40:50 backup7 kernel: [<ffffffff8118bb0e>] __get_free_pages+0xe/0x40 Dec 3 10:40:50 backup7 kernel: [<ffffffff811e146e>] kmalloc_order_trace+0x2e/0xa0 Dec 3 10:40:50 backup7 kernel: [<ffffffff811e5011>] __kmalloc+0x211/0x230 Dec 3 10:40:50 backup7 kernel: [<ffffffffc0530e3e>] mapClass_AllocAndInitObj+0x3e/0x120 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc0531ca6>] mapClass_UpdateAll+0x306/0x760 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc055d54a>] ? mosMitSchedThrd_CurThrdOurs+0x6a/0xa0 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc053df93>] mapMdm_HandleObjUpdate_CK+0x2b3/0x540 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc053e290>] ? mapMdm_SendUpdateReq_CK+0x70/0xcd0 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc053e686>] mapMdm_SendUpdateReq_CK+0x466/0xcd0 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc0547a46>] ? netSock_DoIO+0xe6/0x630 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc05112f0>] ? netChan_SendReq_CK+0x70/0x800 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc0511432>] netChan_SendReq_CK+0x1b2/0x800 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc051a5fe>] netCon_SendReq_CK+0x17e/0x500 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc05158d7>] ? netRPC_SendDone_CK+0x47/0x6f0 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc05159ad>] netRPC_SendDone_CK+0x11d/0x6f0 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc055d7df>] mosMit_RunWithTLS+0x4f/0x60 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc055f0ba>] mosMitSchedThrd_ThrdEntry+0x1aa/0x510 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc055c490>] ? mosTicks_GetCurrentTick+0x20/0x20 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffffc055c4aa>] mosOsThrd_Entry+0x1a/0x40 [scini] Dec 3 10:40:50 backup7 kernel: [<ffffffff810b4031>] kthread+0xd1/0xe0 Dec 3 10:40:50 backup7 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 Dec 3 10:40:50 backup7 kernel: [<ffffffff816c0577>] ret_from_fork+0x77/0xb0 Dec 3 10:40:50 backup7 kernel: [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40 Dec 3 10:40:50 backup7 kernel: Mem-Info: Dec 3 10:40:50 backup7 kernel: active_anon:540198 inactive_anon:192106 isolated_anon:0#012 active_file:526767 inactive_file:908890 isolated_file:0#012 unevictable:0 dirty:2548 writeback:0 unstable:0#012 slab_reclaimable:113189 slab_unreclaimable:12471#012 mapped:4048 shmem:21154 pagetables:2768 bounce:0#012 free:87384 free_pcp:669 free_cma:0 Dec 3 10:40:50 backup7 kernel: Node 0 DMA free:15900kB min:104kB low:128kB high:156kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Dec 3 10:40:50 backup7 kernel: lowmem_reserve[]: 0 2814 9821 9821 Dec 3 10:40:50 backup7 kernel: Node 0 DMA32 free:200976kB min:19336kB low:24168kB high:29004kB active_anon:195676kB inactive_anon:266280kB active_file:292588kB inactive_file:1429216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:2884228kB mlocked:0kB dirty:1004kB writeback:0kB mapped:5056kB shmem:26680kB slab_reclaimable:405056kB slab_unreclaimable:19648kB kernel_stack:2464kB pagetables:1864kB unstable:0kB bounce:0kB free_pcp:468kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Dec 3 10:40:50 backup7 kernel: lowmem_reserve[]: 0 0 7006 7006 Dec 3 10:40:50 backup7 kernel: Node 0 Normal free:132556kB min:48136kB low:60168kB high:72204kB active_anon:1965116kB inactive_anon:502176kB active_file:1814484kB inactive_file:2206340kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:7340032kB managed:7174724kB mlocked:0kB dirty:9200kB writeback:0kB mapped:11168kB shmem:57936kB slab_reclaimable:47700kB slab_unreclaimable:30224kB kernel_stack:4960kB pagetables:9208kB unstable:0kB bounce:0kB free_pcp:2212kB local_pcp:704kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Dec 3 10:40:50 backup7 kernel: lowmem_reserve[]: 0 0 0 0 Dec 3 10:40:50 backup7 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB Dec 3 10:40:50 backup7 kernel: Node 0 DMA32: 5802*4kB (UEM) 3223*8kB (UEM) 9329*16kB (UEM) 85*32kB (UEM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 201104kB Dec 3 10:40:50 backup7 kernel: Node 0 Normal: 29631*4kB (UEM) 1755*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132564kB Dec 3 10:40:50 backup7 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Dec 3 10:40:50 backup7 kernel: 1469304 total pagecache pages Dec 3 10:40:50 backup7 kernel: 12478 pages in swap cache Dec 3 10:40:50 backup7 kernel: Swap cache stats: add 927451, delete 914973, find 1499563/1552563 Dec 3 10:40:50 backup7 kernel: Free swap = 3295096kB Dec 3 10:40:50 backup7 kernel: Total swap = 4194300kB Dec 3 10:40:50 backup7 kernel: ScaleIO R2_5 mapClass_AllocAndInitObj:1212 :Error: Failed to allocate memory 36288.Cannot process MDM response
Ao mesmo tempo ou posteriormente (dependendo da carga de trabalho), aparecem erros "NO_RESOURCES" do SDC e/ou "erros de E/S" do SDC e/ou erros de E/S do file system:
Arquivo de mensagens:
Dec 3 11:23:55 backup7 kernel: ScaleIO R2_5 mapClass_UpdateAll:523 :Error: Object ffff8802aa340000 failed to update in place.status NO_RESOURCES (67)
Dec 3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567770049] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16da.
Dec 3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec 3 11:24:45 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 2166028544
Dec 3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567770056] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16da.
Dec 3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec 3 11:24:45 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 2166028544
Dec 3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567770372] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16da.
Dec 3 11:24:45 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec 3 11:24:45 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 2166028552
...
...
Dec 3 11:27:05 backup7 kernel: XFS (dm-2): metadata I/O error: block 0x7dec700 ("xfs_trans_read_buf_map") error 19 numblks 32
Dec 3 11:27:05 backup7 kernel: XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -19.
Dec 3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567910448] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16ac.
Dec 3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec 3 11:27:05 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 132042496
Dec 3 11:27:05 backup7 kernel: XFS (dm-2): metadata I/O error: block 0x7dec700 ("xfs_trans_read_buf_map") error 19 numblks 32
Dec 3 11:27:05 backup7 kernel: XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -19.
Dec 3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:361 :[7567910460] IO-ERROR comb: 0. offsetInComb 0. SizeInLB 0. SDS_ID 0. Comb Gen 0. Head Gen 16ac.
Dec 3 11:27:05 backup7 kernel: ScaleIO R2_5 mapVolIO_ReportIOErrorIfNeeded:374 :Vol ID 0x7dfb023900000046. Last fault Status IO_FAULT_NOT_PRI(12).Last error Status NOT_FOUND(3) Reason (failed getting LB-Info) Retry count (0) chan (0)
Dec 3 11:27:05 backup7 kernel: blk_update_request: I/O error, dev scinia, sector 132042496
Impacto
O SDC não está funcionando.
Desconexão do SDC.
Acesso perdido a um ou mais volumes.
Cause
Não havia memória contígua suficiente para o SDC.
Fragmentação de memória e pouca memória disponível no host.
Como a máquina Linux tinha pouca memória disponível e devido à fragmentação de memória, não havia memória suficiente para o SDC.
Por padrão, o SDC usa grandes mansões para alocação de memória. Nesse caso específico, o SDC solicitou 36 mil (36288) de memória que não podem ser alocadas:
Dec 3 10:40:50 backup7 kernel: ScaleIO R2_5 mapClass_AllocAndInitObj:1212 :Error: Failed to allocate memory 36288.Cannot process MDM response
No arquivo de mensagens: havia aproximadamente 132MB de memória disponível, no entanto, não havia pedaços grandes suficientes (32k, 64k, etc.) disponíveis para alocação de memória, resultando em pane no kernel:
Havia 29631*4kb chunks disponíveis mais 1755*8k chunks disponíveis = 132MB (132564kb).
Dec 3 10:40:50 backup7 kernel: Node 0 Normal: 29631*4kB (UEM) 1755*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 132564kB
Nota: É improvável que isso aconteça com uma máquina com poucos GB disponíveis de memória.
Resolution
Nota: Uma reinicialização do host limpará temporariamente a fragmentação de memória até a próxima vez que o problema ocorrer.
No lado do SDC, não há solução temporária, pois o comportamento é padrão.
No lado do host:
1) Adicionar mais memória e garantir que a memória disponível permaneça alta o suficiente.
2) Nesse caso específico, as máquinas Linux do SDC são VMs, mover o SDC para o ESXi resolveria o problema, pois os hosts do ESXi têm poucos GB de memória disponível.
3) Verifique se os aplicativos/serviços em execução podem causar ou contribuir para a fragmentação da memória.
Additional Information
Versões afetadas
Qualquer versão do SIO.
Corrigido na versão
N/D