Powerflex Management Platform: keycloak-0 logs HTTP probe failed with statuscode: 503
Summary: This article explains an issue where the keycloak-0 pod reports a health check failure due to database connectivity problems caused by an incorrect DNS configuration. This issue impacts authentication services managed by keycloak ...
Symptoms
Scenario
One of the two Keycloak pods (here keycloak-0) experiences connectivity issues with the database, while keycloak-1 remains functional.
Event logs shows repeated readiness probe failures.
# kubectl get pods -n powerflex | egrep keycloak
keycloak-0 1/1 Running 0 22d
keycloak-1 1/1 Running 0 22d
# kubectl get events | egrep kube
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 12m (x58 over 17h) keycloak-0 Readiness probe failed: HTTP probe failed with statuscode: 503
The keycloak pod logs indicate a failure to acquire JDBC connections due to an acquisition timeout:
# kubectl get logs keycloak-0 -n powerflex
..
2024-11-27 07:01:41,593 INFO [org.infinispan.CLUSTER] (non-blocking-thread--p2-t126) [Context=actionTokens] ISPN100010: Finished rebalance with members [keycloak-0-17437, keycloak-1-41022], topology id 7
2024-11-27 07:31:03,379 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) SQL Error: 0, SQLState: null
2024-11-27 07:31:03,379 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) Acquisition timeout while waiting for new connection
2024-11-27 07:31:03,384 ERROR [org.keycloak.services.scheduled.ScheduledTaskRunner] (Timer-0) Failed to run scheduled task ClearExpiredEvents: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
at java.base/java.util.TimerThread.run(Timer.java:506)
Caused by: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection <---------
..
Caused by: java.sql.SQLException: Acquisition timeout while waiting for new connection <---------
..
Caused by: java.util.concurrent.TimeoutException <---------
..
2024-11-27 09:31:03,476 INFO [io.smallrye.health] (executor-thread-15) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak database connections health check","status":"DOWN","data":{"Failing since":"2024-11-27 07:31:03,477"}}]}
2024-11-27 09:56:03,477 INFO [io.smallrye.health] (executor-thread-15) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak database connections health check","status":"DOWN","data":{"Failing since":"2024-11-27 07:31:03,477"}}]}
Impact
Authentication requests handled by keycloak-0 fail, causing intermittent or complete authentication failures for the PowerFlex Management Platform.keycloak health check continuously reports a DOWN status, impacting high availability.
Cause
The issue occurs due to incorrect DNS configuration.
The JDBC connection used by keycloak to connect to the database relies on resolving the database hostname or endpoint.
Any misconfiguration or failure in hostname resolution can cause timeouts when attempting to establish a connection.
Resolution
1) Fix the DNS configuration as per the operating system documentation
a) If RedHat or CentOS v7,x or v8,x,
i) Edit /etc/resolv.conf to update the correct DNS server on each MgmtVMs (MVMs)
ii) Delete the coredns pods (rke2-coredns-rke2-coredns-xxxxxxxxxx-xxxxx) to propagate the changes to those pods:
for x in `kubectl get pods -n kube-system | grep -i rke2-coredns-rke2-coredns | awk '{print $1}' | grep -iv auto`; do kubectl delete pods -n kube-system $x; done
iii) Verify DNS changes are now reflected in the coredns pods (there are 2 coredns pods responsible for DNS):
for x in `kubectl get pods -n kube-system | grep -i rke2-coredns-rke2-coredns | awk '{print $1}' | grep -iv auto`; do echo $x; kubectl exec -it $x -n kube-system -- cat /etc/resolv.conf; echo " "; done
b) If SLES v15.x and above, engage support to follow internal article https://www.dell.com/support/kbdoc/en-us/000227354
2) Restart keycloak pods
kubectl rollout restart statefulset keycloak -n powerflex
3) Monitor keycloak logs for any additional database connectivity issues
kubectl logs keycloak-0 -n powerflex [-f] kubectl logs keycloak-1 -n powerflex [-f]