PowerFlex: UI Not Loading Because SDNAS Pod Is Crashing
Summary: The 4.x PFxM UI is not loading because the PowerFlex Management Platform "PFMP" pod SDNAS gateway is not in a healthy state.
Symptoms
- The UI screen stuck in a loading state
- All PFMP pods are running except for the SDNAS Gateway
- File storage (SDNAS) may or may not be actually in use in the system
ASMManager logs show issues with the SDNAS Gateway pod:
2023-11-15 20:39:25,187 [AsmManagerAppAppInitializationThread] (PingUtil.java:32) [DEBUG] Could not connect to host sdnasgw.powerflex.svc on port 443
2023-11-15 20:39:25,188 [AsmManagerAppAppInitializationThread] (LCMService.java:1237) [DEBUG] Service checks completed, msg: SDNAS Gateway pod failed to response
2023-11-15 20:39:25,188 [AsmManagerAppAppInitializationThread] (LCMService.java:1262) [WARN] Liveness probe error: SDNAS Gateway pod failed to respons
SDNAS Gateway logs showing failed DNS events:
[ERROR] plugin/errors: 2 postgres-ha-pgbouncer.powerflex.svc.cluster.local.<DNS>. AAAA: read udp 10.42.0.184:57617->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 postgres-ha-pgbouncer.powerflex.svc.cluster.local.<DNS>. AAAA: read udp 10.42.0.184:59414->8.8.8.8:53: i/o timeout
[ERROR] plugin/errors: 2 postgres-ha-pgbouncer.powerflex.svc.cluster.local.<DNS>. A: read udp 10.42.0.184:50241->8.8.8.8:53: i/o timeout
In this case, the 8.8.8.8 IP address is being used, which is incorrect.
The CoreDNS pod config map shows that the "forward" is pointing to 8.8.8.8.
Corefile: ".:53 {\n errors \n health {\n lameduck 5s\n }\n ready
\n kubernetes cluster.local cluster.local in-addr.arpa ip6.arpa {\n pods
insecure\n fallthrough in-addr.arpa ip6.arpa\n ttl 30\n }\n prometheus
\ 0.0.0.0:9153\n forward . 8.8.8.8\n loop \n cache 30\n reload
\n loadbalance \n}"
Log file for reference rke2-coredns-rke2-coredns_data.txt. This file is collected in the PFxM log bundle.
Impact
Unable to access the UI.
Cause
The ASMManager pod depends on specific pods to maintain health and pass a "Liveness" check. In this instance, the SDNAS Gateway pod failed to initiate, resulting in a failed "Liveness" check. The failure of the SDNAS Gateway was attributed to receiving incorrect DNS settings from the CoreDNS pod. The CoreDNS was improperly directing other pods/services to the IP Address 8.8.8.8. Instead, the CoreDNS should be referencing its own /etc/resolv.conf file which would have the correct DNS values.
Resolution
The CoreDNS configuration map must be adjusted so that the CoreDNS pods see the correct DNS configuration.
1) SSH to one of the PFMP servers:
2) Edit the CoreDNS configuration map:
kubectl edit cm -n kube-system rke2-coredns-rke2-coredns
3) Identify the "forward" block and value and modify the value to be "/etc/resolv.conf." In this case, the incorrect value is 8.8.8.8.
The configuration map should look similar to the below:
kubectl get cm -n kube-system -o yaml rke2-coredns-rke2-coredns
apiVersion: v1
data:
Corefile: ".:53 {\n errors \n health {\n lameduck 5s\n }\n ready
\n kubernetes cluster.local cluster.local in-addr.arpa ip6.arpa {\n pods
insecure\n fallthrough in-addr.arpa ip6.arpa\n ttl 30\n }\n prometheus
\ 0.0.0.0:9153\n forward . /etc/resolv.conf\n cache 30\n loop \n reload
\n loadbalance \n}"
kind: ConfigMap
4) Restart the CoreDNS pods:
for x in `kubectl get pods -n kube-system|grep -i rke2-coredns-rke2-coredns|awk '{print $1}'|grep -iv auto`; do kubectl delete pods -n kube-system $x; done
5) Restart the SDNAS Gateway:
kubectl get pods -n powerflex|grep -i sdnas|awk {'print $1'}|xargs kubectl delete pod -n powerflex
6) Wait about 5-15 minutes and the PFxM UI should be reachable.
Impacted Versions
PowerFlex Manager 4.x
Fixed In Version
N/A - Working as designed
Additional Information