Powerflex 管理平台:keycloak-0 記錄 HTTP 探測失敗,並傳回狀態碼:503
Summary: 本文介紹了由於 DNS 配置不正確導致資料庫連接問題導致 keycloak-0 pod 報告運行狀況檢查失敗的問題。此問題會影響 keycloak 管理的驗證服務
Symptoms
案例
兩個Keycloak吊艙之一(此處為: keycloak-0) 遇到資料庫連線問題,而 keycloak-1 保持功能正常。
Event 記錄顯示重複的準備程度探測失敗。
# kubectl get pods -n powerflex | egrep keycloak
keycloak-0 1/1 Running 0 22d
keycloak-1 1/1 Running 0 22d
# kubectl get events | egrep kube
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 12m (x58 over 17h) keycloak-0 Readiness probe failed: HTTP probe failed with statuscode: 503
可使用 keycloak Pod 日誌指示由於獲取超時而無法獲取 JDBC 連接:
# kubectl get logs keycloak-0 -n powerflex
..
2024-11-27 07:01:41,593 INFO [org.infinispan.CLUSTER] (non-blocking-thread--p2-t126) [Context=actionTokens] ISPN100010: Finished rebalance with members [keycloak-0-17437, keycloak-1-41022], topology id 7
2024-11-27 07:31:03,379 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) SQL Error: 0, SQLState: null
2024-11-27 07:31:03,379 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) Acquisition timeout while waiting for new connection
2024-11-27 07:31:03,384 ERROR [org.keycloak.services.scheduled.ScheduledTaskRunner] (Timer-0) Failed to run scheduled task ClearExpiredEvents: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
at java.base/java.util.TimerThread.run(Timer.java:506)
Caused by: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection <---------
..
Caused by: java.sql.SQLException: Acquisition timeout while waiting for new connection <---------
..
Caused by: java.util.concurrent.TimeoutException <---------
..
2024-11-27 09:31:03,476 INFO [io.smallrye.health] (executor-thread-15) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak database connections health check","status":"DOWN","data":{"Failing since":"2024-11-27 07:31:03,477"}}]}
2024-11-27 09:56:03,477 INFO [io.smallrye.health] (executor-thread-15) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak database connections health check","status":"DOWN","data":{"Failing since":"2024-11-27 07:31:03,477"}}]}
影響
由以下人員處理的身份驗證要求 keycloak-0 失敗,導致 PowerFlex 管理平台間歇性或完整驗證失敗。keycloak 執行狀況檢查持續回報 DOWN 狀態,影響高可用性。
Cause
此問題是由於 DNS 組態不正確所導致。
使用的 JDBC 連線 keycloak 連接到資料庫依賴於解析資料庫主機名或終結點。
嘗試建立連線時,任何組態錯誤或主機名稱解析失敗都可能導致逾時。
Resolution
1) 根據作業系統說明文件修正 DNS 組態。
a) 如果是 RedHat 或 CentOS v7、x 或 v8、x,
i) 編輯 /etc/resolv.conf 以在每個管理 (MVM) 上更新正確的 DNS 伺服器
(二) 刪除 coredns 豆莢 (rke2-coredns-rke2-coredns-xxxxxxxxxx-xxxxx) 以將變更傳播到這些容器:
for x in `kubectl get pods -n kube-system | grep -i rke2-coredns-rke2-coredns | awk '{print $1}' | grep -iv auto`; do kubectl delete pods -n kube-system $x; done
iii) 驗證 DNS 變更現在反映在 coredns 豆莢(有 2 個 coredns 負責 DNS 的 Pod):
for x in `kubectl get pods -n kube-system | grep -i rke2-coredns-rke2-coredns | awk '{print $1}' | grep -iv auto`; do echo $x; kubectl exec -it $x -n kube-system -- cat /etc/resolv.conf; echo " "; done
b) 如果是 SLES v15.x及更新版本,請聯絡支援以遵循內部文章 https://www.dell.com/support/kbdoc/en-us/000227354
2) 重新啟動 keycloak 莢
kubectl rollout restart statefulset keycloak -n powerflex
3) 顯示器 keycloak 任何其他資料庫連線問題的紀錄檔
kubectl logs keycloak-0 -n powerflex [-f] kubectl logs keycloak-1 -n powerflex [-f]