Powerflex Management Platform: keycloak-0 logs HTTP probe failed with statuscode: 503

Summary: This article explains an issue where the keycloak-0 pod reports a health check failure due to database connectivity problems caused by an incorrect DNS configuration. This issue impacts authentication services managed by keycloak ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Scenario

One of the two Keycloak pods (here keycloak-0) experiences connectivity issues with the database, while keycloak-1 remains functional.

 

Event logs shows repeated readiness probe failures.

# kubectl get pods -n powerflex | egrep keycloak
keycloak-0                                                1/1     Running     0               22d
keycloak-1                                                1/1     Running     0               22d

# kubectl get events | egrep kube
Events:
  Type     Reason     Age                 From     Message
  ----     ------     ----                ----     -------
  Warning  Unhealthy  12m (x58 over 17h)  keycloak-0  Readiness probe failed: HTTP probe failed with statuscode: 503

The keycloak pod logs indicate a failure to acquire JDBC connections due to an acquisition timeout:

# kubectl get logs keycloak-0 -n powerflex
..
2024-11-27 07:01:41,593 INFO  [org.infinispan.CLUSTER] (non-blocking-thread--p2-t126) [Context=actionTokens] ISPN100010: Finished rebalance with members [keycloak-0-17437, keycloak-1-41022], topology id 7
2024-11-27 07:31:03,379 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) SQL Error: 0, SQLState: null
2024-11-27 07:31:03,379 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-0) Acquisition timeout while waiting for new connection
2024-11-27 07:31:03,384 ERROR [org.keycloak.services.scheduled.ScheduledTaskRunner] (Timer-0) Failed to run scheduled task ClearExpiredEvents: javax.persistence.PersistenceException: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection
        at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
        at java.base/java.util.TimerThread.run(Timer.java:506)
Caused by: org.hibernate.exception.GenericJDBCException: Unable to acquire JDBC Connection  <---------
..
Caused by: java.sql.SQLException: Acquisition timeout while waiting for new connection  <--------- 
.. 
Caused by: java.util.concurrent.TimeoutException  <--------- 
..
2024-11-27 09:31:03,476 INFO  [io.smallrye.health] (executor-thread-15) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak database connections health check","status":"DOWN","data":{"Failing since":"2024-11-27 07:31:03,477"}}]}
2024-11-27 09:56:03,477 INFO  [io.smallrye.health] (executor-thread-15) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Keycloak database connections health check","status":"DOWN","data":{"Failing since":"2024-11-27 07:31:03,477"}}]}
 
 

Impact

Authentication requests handled by keycloak-0 fail, causing intermittent or complete authentication failures for the PowerFlex Management Platform.
keycloak health check continuously reports a DOWN status, impacting high availability.

 

Cause

The issue occurs due to incorrect DNS configuration.

The JDBC connection used by keycloak to connect to the database relies on resolving the database hostname or endpoint.

Any misconfiguration or failure in hostname resolution can cause timeouts when attempting to establish a connection. 

Resolution

1) Fix the DNS configuration as per the operating system documentation

a) If RedHat or CentOS v7,x or v8,x,

i) Edit /etc/resolv.conf to update the correct DNS server on each MgmtVMs (MVMs)

ii) Delete the coredns pods (rke2-coredns-rke2-coredns-xxxxxxxxxx-xxxxx) to propagate the changes to those pods:

for x in `kubectl get pods -n kube-system | grep -i rke2-coredns-rke2-coredns | awk '{print $1}' | grep -iv auto`; do kubectl delete pods -n kube-system $x; done

iii) Verify DNS changes are now reflected in the coredns pods (there are 2 coredns pods responsible for DNS): 

for x in `kubectl get pods -n kube-system | grep -i rke2-coredns-rke2-coredns | awk '{print $1}' | grep -iv auto`; do echo $x; kubectl exec -it $x -n kube-system -- cat /etc/resolv.conf; echo " "; done

 

b) If SLES v15.x and above, engage support to follow internal article https://www.dell.com/support/kbdoc/en-us/000227354

2) Restart keycloak pods

kubectl rollout restart statefulset keycloak -n powerflex 

3) Monitor keycloak logs for any additional database connectivity issues 

kubectl logs keycloak-0 -n powerflex [-f]
kubectl logs keycloak-1 -n powerflex [-f]



Products

PowerFlex rack, PowerFlex Appliance, PowerFlex custom node, ScaleIO, PowerFlex appliance connectivity
Article Properties
Article Number: 000261288
Article Type: Solution
Last Modified: 19 Dec 2024
Version:  1
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.