- 기술지원k-paas node not ready 현상
-
정*주 2024-08-01 14:02:49- hits116
안녕하세요. ncp에서 제공된 k8s 속 node 에 접근하여 조작하다가 node의 상태가 not ready로 바뀌고 접근이 제한되는 현상이 발생했습니다. 에러 로그는 다음과 같습니다.
kubectl describe node contest73-node-w-5b14
Name: contest73-node-w-5b14
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=SVR.VSVR.HICPU.C002.M004.G003
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=1
failure-domain.beta.kubernetes.io/zone=2
kubernetes.io/arch=amd64
kubernetes.io/hostname=contest73-node-w-5b14
kubernetes.io/os=linux
ncloud.com/nks-nodepool=contest73-node
node.kubernetes.io/instance-type=SVR.VSVR.HICPU.C002.M004.G003
nodeId=25530113
regionNo=1
topology.kubernetes.io/region=1
topology.kubernetes.io/zone=2
zoneNo=2
Annotations: alpha.kubernetes.io/provided-node-ip: 192.168.6.7
csi.volume.kubernetes.io/nodeid: {"blk.csi.ncloud.com":"25530113","nas.csi.ncloud.com":"contest73-node-w-5b14"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 16 Jul 2024 18:19:12 -0700
Taints: node.cloudprovider.kubernetes.io/shutdown:NoSchedule
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: contest73-node-w-5b14
AcquireTime: <unset>
RenewTime: Wed, 31 Jul 2024 09:21:26 -0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
−−−− −−−−−− −−−−−−−−−−−−−−−−- −−−−−−−−−−−−−−−−−− −−−−−− −−−−−−-
NetworkUnavailable False Tue, 16 Jul 2024 18:20:11 -0700 Tue, 16 Jul 2024 18:20:11 -0700 CiliumIsUp Cilium is running on this node
MemoryPressure Unknown Wed, 31 Jul 2024 09:21:26 -0700 Wed, 31 Jul 2024 09:22:08 -0700 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Wed, 31 Jul 2024 09:21:26 -0700 Wed, 31 Jul 2024 09:22:08 -0700 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Wed, 31 Jul 2024 09:21:26 -0700 Wed, 31 Jul 2024 09:22:08 -0700 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Wed, 31 Jul 2024 09:21:26 -0700 Wed, 31 Jul 2024 09:22:08 -0700 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.6.7
Hostname: contest73-node-w-5b14
ExternalIP: 223.130.143.233
Capacity:
cpu: 2
ephemeral-storage: 103083576Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4005908Ki
pods: 110
Allocatable:
cpu: 1930m
ephemeral-storage: 95001823485
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2902036Ki
pods: 110
System Info:
Machine ID: 83c00806e7504df088d78c7418b87290
System UUID: 8fb90707-22c2-447c-9b5e-f21836acfada
Boot ID: e1ffd727-4887-48eb-bf13-489bc70c95b4
Kernel Version: 5.15.0-94-generic
OS Image: Ubuntu 22.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.7.2
Kubelet Version: v1.27.9
Kube-Proxy Version: v1.27.9
PodCIDR: 198.18.0.0/24
PodCIDRs: 198.18.0.0/24
ProviderID: navercloudplatform://25530113
Non-terminated Pods: (23 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
−−−−−−−−- −−−− −−−−−−−−−−−− −−−−−−−−−− −−−−−−−−−−−−−−- −−−−−−−−−−−−- −−-
default nsenter-0u4p6r 0 (0%) 0 (0%) 0 (0%) 0 (0%) 86m
default nsenter-1zk8nb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 87m
default nsenter-2azzav 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9h
default nsenter-47xgim 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9h
default nsenter-4ks9w0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11h
default nsenter-884qql 0 (0%) 0 (0%) 0 (0%) 0 (0%) 47m
default nsenter-8baca2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9h
default nsenter-9j79kd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 63m
default nsenter-c4xpif 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h23m
default nsenter-gcwdhc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11h
default nsenter-jasar4 0 (0%) 0 (0%) 0 (0%) 0 (0%) 85m
default nsenter-pds9tj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11h
default nsenter-qhzhyx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h7m
default nsenter-si1d65 0 (0%) 0 (0%) 0 (0%) 0 (0%) 86m
default nsenter-syjdrl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6h21m
default nsenter-uzih32 0 (0%) 0 (0%) 0 (0%) 0 (0%) 107m
default nsenter-vly0ha 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9h
kube-system cilium-brw9p 100m (5%) 0 (0%) 10Mi (0%) 0 (0%) 15d
kube-system csi-nks-controller-dfdb58f9c-ghqnz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system csi-nks-node-zmtb8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system nks-nas-csi-node-xzzvr 30m (1%) 400m (20%) 60Mi (2%) 500Mi (17%) 15d
kube-system nks-nodelocalproxy-qn2kv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system nodelocaldns-bz9qq 100m (5%) 0 (0%) 70Mi (2%) 170Mi (5%) 15d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
−−−−−−−− −−−−−−−− −−−−−−
cpu 230m (11%) 400m (20%)
memory 140Mi (4%) 670Mi (23%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
kubelet을 재시작하다가 에러가 발생해서 노드의 상태가 not ready로 되었는데, 이 경우 다시 접근할 수 있는 방법이 있을까요? 접근이 불가하다면 노드를 삭제하고 추가하는 방식으로 진행해도 되나요?
안녕하세요. 개방형 클라우드 플랫폼 센터입니다.
지난 7월 29일 박*원님이 주신 문의사항과 동일하여 아래와 같이 동일한 답변 드립니다.
올려주신 로그를 확인해 보니, "Kubelet stopped posting node status" 이슈가 있는 것으로 확인되며 Taint가 NoSchedule로 설정되어 있습니다. (Taints: node.kubernetes.io/unreachable:NoSchedule)
NoSchedule로 설정되어 있는 경우 해당 노드가 pod 스케줄링에서 제외됩니다.
해당 노드에 접속해서 root 권한으로 kubelet 서비스 상태를 확인한 후 inactive 상태일 경우 active 해주시기 바랍니다.
# root 권한 변경
sudo -i
# kubelet status 확인
systemctl status kubelet
(inactive 예시)
○ kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Tue 2024-07-30 09:52:01 KST; 5s ago
# kubelet 활성
systemctl enable --now kubelet
# kubelet active 상태 확인
systemctl status kubelet
(active 예시)
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-07-30 09:52:23 KST; 1s ago
위 과정 진행 후 master node에서 node 조회 시 "Ready" 상태로 변경되었는지 확인해 보시기 바랍니다.
※ 공모전 관련 문의는 공모전 메일 (contest@k-paas.or.kr)로 보내주시기 바랍니다.
감사합니다.