[따배쿠] Pod - liveness Probe를 이용해서 Self-healing Pod 만들기

kubernetes

[따배쿠] Pod - liveness Probe를 이용해서 Self-healing Pod 만들기

bbiyak2da 2024. 12. 1. 11:58

Liveness Probe

Pod가 계속 실행할 수 있게 보장하는 기능
쿠버네티스의 self-healing 기능 중 하나
- self-healing : container가 정상적으로 동작하지 않을 때, 자동으로 restart 보장
Pod의 spec에 정의

[예시]

해당 예시는, livenessProbe 기능 중 httpGet probe (웹 서비스 기능) 으로 컨테이너가 잘 작동하고 있는지 확인

어떻게? 웹 서비스에 http(80포트)로 주기적으로 접속, path는 /(root)

응답이 잘 나오면 컨테이너가 잘 작동하는 걸로 판단

Liveness Probe 매커니즘

애플리케이션에 따라 liveness probe 방법이 다르다.
- 1. httpGet : 지정한 IP주소, port, path에 HTTP Get 요청을 보내, 해당 컨테이너가 응답하는지 확인한다. 반환코드가 200이 아닌 값이 나오면 오류, 컨테이너를 다시 시작한다.
  - 예시 1. 웹 서버 컨테이너(nginx)가 있다. path /(루트)에 http (80포트)로 주기적으로 요청하여 응답을 반환하는지 확인한다. 응답을 잘 반환하면 건강한 컨테이너, 연속 3번 요청 시에도 응답을 반환하지 못하면 건강하지 않은 컨테이너로 판단
  - 쿠버네티스는 건강하지 않은 컨테이너는 kill하고, Docker hub에서 새 컨테이너를 받아와 다시 작동시킨다.
- 2. tcpSocket : 지정한 포트에 TCP 연결을 시도하고, 연결되지 않으면 컨테이너를 다시 시작
  - 예시 1. ssh 데몬을 서비스(Client 접속을 22번 port로 받아주는 서비스) 하는 컨테이너가 있다고 하자
  - tcpSocket을 통해 22번으로 접속 했을 때 성공하면 건강한 컨테이너, 그렇지 않을 경우 건강하지 않은 컨테이너로 판단
  - 즉 3번 연속 요청 반환 실패 시, 쿠버네티스는 스스로 해당하는 컨테이너를 kill하고, 새로운 컨테이너를 동작시켜 건강한 컨테이너로 항상 서비스하도록 보장
  - 예시 2. nfs 데몬을 서비스하는 컨테이너가 있다. 보통 4096 포트를 통해 서비스 하는데, 4096 포트로 요청을 한 뒤 제대로 응답을 반환하지 않으면 해당하는 컨테이너를 kill하고, 새로운 컨테이너를 동작
  - 즉, 해당 애플리케이션이 열고있는 포트로 접속을 시도해서 연결 성공하면 건강한 컨테이너고 연결 실패하면 건강하지 않은 컨테이너로 판단
- 3. exec : exec 명령을 전달하고 명령의 종료코드가 0이 아니면 컨테이너를 다시 시작
  - 예시 1. 특정 컨테이너가 pod 기반으로 서비스하는데, 해당 컨테이너는 백엔드에 있는 db에서 특정 데이터를 가져와 서비스를 한다.
  - exec 옵션을 통해 컨테이너 안에서 실행할 command를 지정 (ls, /data/file)하여 주기적으로 요청한다. 요청을 잘 반환하면 건강한 컨테이너, 3번 요청 시에도 반환하지 않으면 건강하지 않은 컨테이너

즉, livenessProbe 옵션으로 요청을 보내 컨테이너가 healthy 상태인 지 판단하고, 연속 3번 요청에 응답을 반환하지 못하는 컨테이너는 unhealthy 컨테이너로 판단한다. unhealthy 컨테이너는 쿠버네티스가 스스로 kill하며 새로운 컨테이너를 작동시킨다.
여기서 주의할 점은 'Pod'를 restart하는게 아니라 'Container'를 restart한다. Container가 restart 된다고 해도 Pod는 그대로니 IP address는 동일하다. (IP는 Pod에 적용되는 것이기 때문이다.)

Liveness Probe 매개변수

periodSeconds

health check 반복 실행 시간 (초)
period=10s : 10초만에 한 번씩 livenessprobe 기능 실행

initialDelaySeconds

Pod 실행 후 Delay 할 시간 (초)
delay=5s : Pod 실행 후 5초 뒤 livenessprobe 기능 실행

timeoutSeconds

health check후 응답을 기다리는 시간 (초)
timeout=1s : 1초 기다린 뒤 응답을 반환하지 않으면 실패로 간주

[예시] default

root@master:~# vi nginx-pod-liveness

#nginx-pod-liveness
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod-liveness
spec:
  containers:
  - name: nginx-container
    image: nginx:1.14
    ports:
    - containerPort: 80
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /
        port: 80

root@master:~# kubectl create -f nginx-pod-liveness.yaml
pod/nginx-pod-liveness created

root@master:~# kubectl describe pod nginx-pod-liveness
Name:             nginx-pod-liveness
Namespace:        default
Priority:         0
Service Account:  default
Node:             node1/10.100.0.101
Start Time:       Sun, 01 Dec 2024 03:13:29 +0000
Labels:           <none>
Annotations:      cni.projectcalico.org/containerID: 6b4671cafa337b93cc00bc10da83312e27ce1d6d14a4e36274e58643821ef99c
                  cni.projectcalico.org/podIP: 192.168.166.147/32
                  cni.projectcalico.org/podIPs: 192.168.166.147/32
Status:           Running
IP:               192.168.166.147
IPs:
  IP:  192.168.166.147
Containers:
  nginx-container:
    Container ID:   containerd://b5752c07bb98e11e8664de0de3de33ae883a0a3f09fec542baa512a6584d9f60
    Image:          nginx:1.14
    Image ID:       docker.io/library/nginx@sha256:f7988fb6c02e0ce69257d9bd9cf37ae20a60f1df7563c3a2a6abe24160306b8d
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 01 Dec 2024 03:13:30 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5wckl (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-5wckl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  4m59s  default-scheduler  Successfully assigned default/nginx-pod-liveness to node1
  Normal  Pulled     4m58s  kubelet            Container image "nginx:1.14" already present on machine
  Normal  Created    4m58s  kubelet            Created container nginx-container
  Normal  Started    4m58s  kubelet            Started container nginx-container

별도로 livenessprobe 매개변수를 지정하지 않아도, 배포 시 default 값으로 livenessprobe 기능이 활성화 되어 있는 것을 확인 가능하다

root@master:~# kubectl get pod nginx-pod-liveness -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 6b4671cafa337b93cc00bc10da83312e27ce1d6d14a4e36274e58643821ef99c
    cni.projectcalico.org/podIP: 192.168.166.147/32
    cni.projectcalico.org/podIPs: 192.168.166.147/32
  creationTimestamp: "2024-12-01T03:13:29Z"
  name: nginx-pod-liveness
  namespace: default
  resourceVersion: "75996"
  uid: fbbb2ddd-01a4-4fe8-b4f3-5dab9df3c67f
spec:
  containers:
  - image: nginx:1.14
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 80
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: nginx-container
    ports:
    - containerPort: 80
      protocol: TCP
    resources: {}

#success=1 : 1번 성공하면 성공으로 간주

#failure=3 : 연속 3번 실패하면 실패로 간주

[예시] 매개변수 지정

root@master:~# vi pod-nginx-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod-liveness
spec:
  containers:
  - name: nginx-container
    image: nginx:1.14
    ports:
    - containerPort: 80
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /
        port: 80
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 3
      failureThreshold: 3

Liveness Probe example

Liveness Probe는 Pod의 spec에 정의한다.
해당 예제에서 사용한 smlinux/unhealthy 컨테이너는 HTTP connection이 있을 때 마다 내부 서버오류로 HTTP 500 ERROR를 반환한다.

root@master:~# vi pod-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-pod
spec:
  containers:
  - image: smlinux/unhealthy
    name: unhealthy-container
    ports:
    - containerPort: 8080
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /
        port: 8080

root@master:~# kubectl create -f pod-liveness.yaml
pod/liveness-pod created

root@master:~# kubectl describe pod liveness-pod
Name:             liveness-pod
Namespace:        default
Priority:         0
Service Account:  default
Node:             node1/10.100.0.101
Start Time:       Sun, 01 Dec 2024 05:14:10 +0000
Labels:           <none>
Annotations:      cni.projectcalico.org/containerID: 5c5fbd5e3ddf5e7ca706e1822c8                                                                                    8c8141977f2d6157ed381578338b408565d75
                  cni.projectcalico.org/podIP: 192.168.166.148/32
                  cni.projectcalico.org/podIPs: 192.168.166.148/32
Status:           Running
IP:               192.168.166.148
IPs:
  IP:  192.168.166.148
Containers:
  unhealthy-container:
    Container ID:   containerd://2be0c20a875916868ab9c4d460f8b75b65b62a86a5fc914                                                                                    eae82d62ea7fee4c3
    Image:          smlinux/unhealthy
    Image ID:       docker.io/smlinux/unhealthy@sha256:5c746a42612be61209417d913                                                                                    030d97555cff0b8225092908c57634ad7c235f7
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 01 Dec 2024 05:19:52 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sun, 01 Dec 2024 05:18:02 +0000
      Finished:     Sun, 01 Dec 2024 05:19:50 +0000
    Ready:          True
    Restart Count:  3
    Liveness:       http-get http://:8080/ delay=0s timeout=1s period=10s #succe                                                                                    ss=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5t2tx (                                                                                    ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-5t2tx:
    Type:                    Projected (a volume that contains injected data fro                                                                                    m multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists fo                                                                                    r 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists                                                                                     for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m6s                   default-scheduler  Successfully ass                                                                                    igned default/liveness-pod to node1
  Normal   Pulled     5m47s                  kubelet            Successfully pul                                                                                    led image "smlinux/unhealthy" in 18.403s (18.403s including waiting). Image size                                                                                    : 263841919 bytes.
  Normal   Created    2m14s (x3 over 5m47s)  kubelet            Created containe                                                                                    r unhealthy-container
  Normal   Started    2m14s (x3 over 5m47s)  kubelet            Started containe                                                                                    r unhealthy-container
  Normal   Pulled     2m14s (x2 over 4m4s)   kubelet            Successfully pul                                                                                    led image "smlinux/unhealthy" in 1.443s (1.443s including waiting). Image size:                                                                                     263841919 bytes.
  Warning  Unhealthy  56s (x9 over 4m56s)    kubelet            Liveness probe f                                                                                    ailed: HTTP probe failed with statuscode: 500
  Normal   Killing    56s (x3 over 4m36s)    kubelet            Container unheal                                                                                    thy-container failed liveness probe, will be restarted
  Normal   Pulling    25s (x4 over 6m5s)     kubelet            Pulling image "s                                                                                    mlinux/unhealthy"
  Normal   Pulled     24s                    kubelet            Successfully pul                                                                                    led image "smlinux/unhealthy" in 1.47s (1.47s including waiting). Image size: 26                                                                                    3841919 bytes.

unhealthy 상태를 반환(에러코드 500)하고, 기존 컨테이너를 kill하고, 다시 새 컨테이너를 생성하는 거까지 확인 가능

문제

아래의 liveness-exam.yaml 파일에 self-healing 기능을 추가하시오

- 동작되는 Pod 내 컨테이너의 /tmp/healthy 파일이 있는지 5초마다 확인한다.

- Pod 실행 후 10초 후부터 검사한다.

- 성공횟수는 1번, 실패 횟수는 연속 2회로 구성한다.

apiVersion: v1
kind: Pod
metadata:
  name: liveness-exam
spec:
  containers:
  - name: busybox-container
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600

[정답]

#vi liveness-exam.yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exam
spec:
  containers:
  - name: busybox-container
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - ls
        - /tmp/healthy
      periodSeconds: 5
      successThreshold: 1
      initialDelaySeconds: 10
      failureThreshold: 2

*코드 해석

touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600

→ touch /tmp/healthy : /tmp/healthy 파일 생성

→ sleep 30; rm -rf /tmp/healthy : 30초 후 /tmp/healthy 파일 삭제

→ sleep 600 : 600초 대기

해당 yaml 파일을 만든 뒤 아래 명령어로 pod 생성 + 확인까지 해보면 되겠당 ~!

kubectl create -f liveness-exam.yaml
kubectl describe pods liveness-exam

참고 영상

https://www.youtube.com/watch?v=-NeJS7wQu_Q