카테고리 없음

k3s : Failed to garbage collect required amount of images. Attempted to free 15362066841 bytes, but only found 0 bytes eligible to free.

프로틴형님 2025. 2. 2. 19:21

Situation

k3s를 서버에 설치한 후 시험 삼아 jenkins를 service로 만들어서 띄워봤다

그래서 <SERVER IP>:<PORT>로 외부에서 접속하려고 하니 ‘페이지를 찾을 수 없다’ 란다 ..

흠 .. 그래서 포트를 안열었나 체크해봤는데 이미 iptime 라우터에는 열려있다.

??? 그래서 GPT 형님에게 물어봐서 kubectl describe nodes (현재 싱글 노드라 하나 밖에 안 뜬다) 명령어로 노드 상태를 봤다.

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sun, 02 Feb 2025 07:22:50 +0000   Sun, 02 Feb 2025 06:48:06 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Sun, 02 Feb 2025 07:22:50 +0000   Sun, 02 Feb 2025 07:10:25 +0000   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Sun, 02 Feb 2025 07:22:50 +0000   Sun, 02 Feb 2025 06:48:06 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sun, 02 Feb 2025 07:22:50 +0000   Sun, 02 Feb 2025 06:48:06 +0000   KubeletReady                 kubelet is posting ready status
Events:
  Type     Reason                             Age                From                   Message
  ----     ------                             ----               ----                   -------
  Normal   Starting                           37m                kube-proxy             
  Warning  PossibleMemoryBackedVolumesOnDisk  37m                kubelet                The tmpfs noswap option is not supported. Memory-backed volumes (e.g. secrets, emptyDirs, etc.) might be swapped to disk and should no longer be considered secure.
  Normal   Starting                           37m                kubelet                Starting kubelet.
  Warning  InvalidDiskCapacity                37m                kubelet                invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced            37m                kubelet                Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory            37m (x2 over 37m)  kubelet                Node  status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure              37m (x2 over 37m)  kubelet                Node  status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID               37m (x2 over 37m)  kubelet                Node  status is now: NodeHasSufficientPID
  Normal   NodeReady                          37m                kubelet                Node  status is now: NodeReady
  Normal   NodePasswordValidationComplete     37m                k3s-supervisor         Deferred node password secret validation complete
  Normal   Synced                             37m                cloud-node-controller  Node synced successfully
  Normal   RegisteredNode                     37m                node-controller        Node gausslab-hq event: Registered Node gausslab-hq in Controller
  Warning  FreeDiskSpaceFailed                32m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15598348697 bytes, but only found 0 bytes eligible to free.
  Warning  ImageGCFailed                      32m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15598348697 bytes, but only found 0 bytes eligible to free.
  Warning  FreeDiskSpaceFailed                27m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15599892889 bytes, but only found 0 bytes eligible to free.
  Warning  ImageGCFailed                      27m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15599892889 bytes, but only found 0 bytes eligible to free.
  Warning  FreeDiskSpaceFailed                22m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15601461657 bytes, but only found 0 bytes eligible to free.
  Warning  ImageGCFailed                      22m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15601461657 bytes, but only found 0 bytes eligible to free.
  Warning  FreeDiskSpaceFailed                17m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15602899353 bytes, but only found 0 bytes eligible to free.
  Warning  ImageGCFailed                      17m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15602899353 bytes, but only found 0 bytes eligible to free.
  Normal   NodeHasDiskPressure                15m                kubelet                Node gausslab-hq status is now: NodeHasDiskPressure
  Warning  EvictionThresholdMet               14m (x9 over 15m)  kubelet                Attempting to reclaim ephemeral-storage
  Warning  FreeDiskSpaceFailed                12m                kubelet                Failed to garbage collect required amount of images. Attempted to free 15357118873 bytes, but only found 0 bytes eligible to free.
  Warning  FreeDiskSpaceFailed                7m53s              kubelet                Failed to garbage collect required amount of images. Attempted to free 15358769561 bytes, but only found 0 bytes eligible to free.
  Warning  FreeDiskSpaceFailed                2m53s              kubelet                Failed to garbage collect required amount of images. Attempted to free 15360424345 bytes, but only found 0 bytes eligible to free.

disk가 부족하다라 ..? 어케 해결해야하지


Solution

GPT 형님한테 물어보다가 아 이건 좀 아닌 것 같은데 .. ;; 라고 생각해서 구글링을 시작

Kubernetes node tainted with disk-pressure

킹갓제네럴 선배님께서 해결하신 이슈임을 발견.

kubectl drain --delete-emptydir-data --ignore-daemonsets <노드명> && kubectl uncordon <노드명>

명령어를 치니까 ?

Events:
  Type    Reason              Age   From     Message
  ----    ------              ----  ----     -------
  Normal  NodeNotSchedulable  32s   kubelet  Node  status is now: NodeNotSchedulable
  Normal  NodeSchedulable     22s   kubelet  Node  status is now: NodeSchedulable

놀랍도록 정상화가 됐다.

GPT 형님께 왜 이런 이유가 발생됐는지 물어봤는데

💡 정리하자면:

  1. 디스크 압박(Disk Pressure) 발생
    • kubelet의 가비지 컬렉션(GC)이 실패하면서 이미지와 컨테이너 정리가 안 됨.
    • 디스크 공간 부족으로 EvictionThresholdMet 이벤트 발생, 파드 강제 종료(Evicted).
  2. 해결 방법
    • kubectl drain --delete-emptydir-data --ignore-daemonsets <노드명>
      • 노드를 drain하여 모든 파드를 제거하고,
      • emptyDir 데이터를 삭제하여 공간 확보.
    • kubectl uncordon <노드명>
      • 노드가 다시 정상적으로 스케줄링되도록 활성화.

라고 한다.

그래서 jenkins를 접속하니까 ? 아직 접속이 안 돼버림 ; ;