CSM:CSI 驱动程序:external-health-monitor-controller 容器中的崩溃导致 CSI 驱动程序控制器重新启动

摘要: Kubernetes 提供的 external-health-monitor-controller 容器可能会导致崩溃和容器存储接口 (CSI) 驱动程序控制器 Pod 重新启动。

本文适用于 本文不适用于 本文并非针对某种特定的产品。 本文并非包含所有产品版本。

症状

问题与:
Dell CSI 驱动程序:v2.1.0 和 v2.2.0
csi-external-health-monitor-controller: v0.4.0

控制器显示 12 次重新启动:

$ kubectl get pod -n isilon
 NAME                          READY   STATUS    RESTARTS        AGE
 isilon-controller-xxxx-xxxx   5/5     Running   12 (141m ago)   32h
...

 

通过使用 -p 要获取以前的日志,请 csi-external-health-monitor-controller container shows the following panic:


E0629 18:48:41.494845       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 200 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x184d660, 0x27951e0)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x184d660, 0x27951e0)
    /usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).deleteNode(0xc000612500, 0xc002fa3b90, 0x28, 0x0)
    /workspace/pkg/controller/node_watcher.go:275 +0x29
github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).WatchNodes.func1(0xb28012007187300)
    /workspace/pkg/controller/node_watcher.go:181 +0x647
github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).WatchNodes(0xc000612500)
    /workspace/pkg/controller/node_watcher.go:185 +0x4b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000499d90)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000499d90, 0x1c820c0, 0xc002b10450, 0x1, 0xc000126060)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000499d90, 0xdf8475800, 0x0, 0x1, 0xc000126060)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc000499d90, 0xdf8475800, 0xc000126060)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).Run
    /workspace/pkg/controller/node_watcher.go:147 +0x194
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x168e749]

goroutine 200 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109
panic(0x184d660, 0x27951e0)
    /usr/local/go/src/runtime/panic.go:965 +0x1b9
github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).deleteNode(0xc000612500, 0xc002fa3b90, 0x28, 0x0)
    /workspace/pkg/controller/node_watcher.go:275 +0x29
github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).WatchNodes.func1(0xb28012007187300)
    /workspace/pkg/controller/node_watcher.go:181 +0x647
github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).WatchNodes(0xc000612500)
    /workspace/pkg/controller/node_watcher.go:185 +0x4b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000499d90)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000499d90, 0x1c820c0, 0xc002b10450, 0x1, 0xc000126060)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000499d90, 0xdf8475800, 0x0, 0x1, 0xc000126060)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc000499d90, 0xdf8475800, 0xc000126060)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/kubernetes-csi/external-health-monitor/pkg/controller.(*NodeWatcher).Run
    /workspace/pkg/controller/node_watcher.go:147 +0x194 

原因

这是 Kubernetes 提供的 external-health-monitor sidecar 版本 0.4.0 中的已知问题,CSI 驱动程序 v2.1 和 v2.2 使用:
https://github.com/kubernetes-csi/external-health-monitor/issues/100第三方链接
https://github.com/kubernetes-csi/external-health-monitor/pull/101 第三方链接

此问题已修复并合并到下一版本的 external-health-monitor sidecar 版本 0.5.0 中。这是在 2022 年 3 月 4 日发布的。

解决方案

分辨率:
根据 Dell Technologies 容器存储模块文档,升级到使用 external-health-monitor sidecar 版本 0.5.0 的 CSI 驱动程序版本 2.3.0 或更高版本:
https://dell.github.io/csm-docs/ 第三方链接图标

 

文章属性
文章编号: 000201147
文章类型: Solution
上次修改时间: 30 1月 2026
版本:  8
从其他戴尔用户那里查找问题的答案
支持服务
检查您的设备是否在支持服务涵盖的范围内。