Configure Endpoint Health Checker

TOC

Overview

The Endpoint Health Checker is a cluster plugin designed to monitor and manage the health status of service endpoints in k8s cluster. It automatically removes unhealthy endpoints from service to ensure traffic is only routed to healthy instances, improving overall service reliability and availability.

Key Features

  • Automatic Health Monitoring: Continuously monitors the health status of service endpoints in k8s cluster
  • Load Balancer Integration: Automatically removes unhealthy endpoints from service
  • Service Availability: Ensures traffic is only directed to healthy, available endpoints
  • Rapid Failover: Reduces endpoint switching time from 40s to 10s during node power outages

Installation

Install via Marketplace

  1. Navigate to Administrator > Marketplace > Cluster Plugins.

  2. Search for "Alauda Container Platform Endpoint Health Checker" in the plugin list.

  3. Click Install to open the installation configuration page.

  4. In the deployment configuration dialog, you can optionally configure the following parameters:

    ParameterDescription
    Node SelectorsConfigure label selectors to specify which nodes the Endpoint Health Checker components should run on. Click Add to add multiple label key-value pairs.
    Node TolerationsConfigure tolerations to allow Endpoint Health Checker components to be scheduled on nodes with specific taints. Click Add to add multiple tolerations with Key, Value, and Type.
  5. Click Install to deploy the plugin.

  6. Wait for the plugin status to change to "Ready".

How It Works

Health Check Mechanism

The Endpoint Health Checker is a dedicated health monitoring component that ensures only healthy endpoints receive traffic. It operates by monitoring service endpoints and automatically managing their availability status.

Core Functionality

The Endpoint Health Checker works by:

  1. Service Discovery: Identifies services and pods configured for health monitoring in the cluster.
  2. Pod Health Monitoring: Monitors the readiness and liveness probe status of pods backing the service endpoints
  3. Active Health Checks: Performs active health assessments using configurable criteria:
    • TCP connectivity checks: Establishes TCP connections to verify port accessibility
  4. Endpoint Management: Automatically removes unhealthy endpoints from service endpoint lists to prevent traffic routing to failed instances

Health Check Process

The health checking process involves:

  • Probe Integration: Leverages Kubernetes readiness and liveness probe results as initial health indicators
  • Network Connectivity: Sends TCP packets to target endpoint ports to verify accessibility
  • Response Validation: Evaluates response status, timing, and content to determine endpoint health
  • Automatic Failover: Removes unresponsive or failed endpoints from service endpoint lists

Performance Improvement

  • Previous Method: Relied on kubelet heartbeat detection with up to 40 seconds delay
  • Current Method: Active endpoint health checking with 10 second detection and switching time
  • Improvement: Significantly improves service availability during node failures in ALB + MetalLB environments

How To Activate

Health checking can be activated through two methods:

For ALB

set alb.cpaas.io/pod-annotations annotation of ALB2

apiVersion: crd.alauda.io/v2
kind: ALB2
metadata:
  annotations:
    alb.cpaas.io/pod-annotations: '{"endpoint-health-checker.io/enabled":"true"}'
  name: demo-alb
spec:
  config:
    loadbalancerName: demo-alb
    nodeSelector:
      ingress: 'true'
    replicas: 1
  type: nginx

For IngressNginx

  1. Install ingress-nginx
  2. Set podAnnotations in .spec.controller.podAnnotations of IngressNginx.
    apiVersion: ingress-nginx.alauda.io/v1
    kind: IngressNginx
    metadata:
      name: demo
      namespace: ingress-nginx-operator
    spec:
      controller:
        replicaCount: 1
        podAnnotations:
          endpoint-health-checker.io/enabled: 'true'

For EnvoyGateway

  1. Install envoy-gateway-operator
  2. Set annotations in .spec.provider.kubernetes.envoyDeployment.patch.value.spec.template.metadata.annotations of EnvoyProxy.
    apiVersion: gateway.networking.k8s.io/v1
    kind: Gateway
    metadata:
      name: demo
    spec:
      infrastructure:
        parametersRef:
          group: gateway.envoyproxy.io
          kind: EnvoyProxy
          name: demo
      gatewayClassName: envoy-gateway-operator-cpaas-default
      listeners:
        - name: http
          port: 80
          protocol: HTTP
    ---
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    metadata:
      name: demo
    spec:
      provider:
        kubernetes:
          envoyDeployment:
            replicas: 1
            patch:
              type: StrategicMerge
              value:
                spec:
                  template:
                    metadata:
                      annotations:
                        endpoint-health-checker.io/enabled: 'true'
            container:
              imageRepository: registry.alauda.cn:60080/acp/envoyproxy/envoy
        type: Kubernetes

For Custom Deployment

set annotations in .spec.template.metadata.annotations of Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
      annotations:
        endpoint-health-checker.io/enabled: 'true'
    spec:
      containers:
        - name: container
          ports:
            - containerPort: 8080
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10
          readinessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5

Pod-level readinessGates (Legacy)

Configure readinessGates in the pod spec for older versions:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pod-legacy
  namespace: cpaas-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: pod-legacy
  template:
    metadata:
      labels:
        app: pod-legacy
    spec:
      readinessGates:
        - conditionType: 'endpointHealthCheckSuccess'
      containers:
        - name: container
          image: your-image:latest
          ports:
            - containerPort: 8080
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10
          readinessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5

Note: The readinessGates configuration is from an older version. It's recommended to use the pod annotation endpoint-health-checker.io/enabled: 'true' for new deployments.

Uninstallation

To uninstall the Endpoint Health Checker:

  1. Navigate to Administrator > Marketplace > Cluster Plugins.

  2. Find the installed "Endpoint Health Checker" plugin.

  3. Click the options menu and select Uninstall.

  4. Confirm the uninstallation when prompted.