Spotlight

CloudNativePG - install and first test: transient failure

Franck Pachot

This tutorial shows how to install CloudNativePG 1.28 operator and deploy a three-node PostgreSQL cluster with synchronous replication and quorum-based failover, then tests transient failure recovery by pausing the primary container.

More articles →

Tools and utilities

  • Helm exporter

    Helm-exporter exports Helm releases, charts, and version statistics in the Prometheus format.

  • make-argocd-fly: Kubernetes manifest generator

    make-argocd-fly is a tool that simplifies the generation of Kubernetes manifests for deployment in complex, multi-cluster, and multi-application environments and provides native integration with ArgoCD for streamlined deployment.

  • DR-Syncer – CLI & Controller for Kubernetes Disaster Recovery

    DR-Syncer is a tool for disaster recovery synchronization between Kubernetes clusters, offering both a controller-based mode and a CLI mode to perform Stage, Cutover, and Failback operations.

  • Kloudlite: RemoteLocal Environments

    Kloudlite is an open-source platform designed to provide seamless and secure development environments for building distributed applications.

  • AIBrix: GenAI inference

    AIBrix is a Kubernetes-native GenAI inference infrastructure toolkit from the vLLM project, with LLM-aware routing, distributed KV cache, LoRA management, and an app-tailored autoscaler for vLLM workloads.

More projects →

Events starting soon

Discover more events onn Kube Events →

Intelligent Kubernetes Load Balancing
Intelligent Kubernetes Load Balancing

You're running gRPC services in Kubernetes, load balancing looks fine on the dashboard — but some pods are burning at 80% CPU while others sit idle, and adding more replicas only partially helps.

Rohit Agrawal, a Staff Software Engineer on the traffic platform team at Databricks, explains why this happens and how his team replaced Kubernetes's default networking with a proxy-less, client-side load-balancing system built on the xDS protocol.

In this episode:

  • Why KubeProxy's Layer 4 routing breaks down under high-throughput gRPC: it picks a backend once per TCP connection, not per request
  • How Databricks built an Endpoint Discovery Service (EDS) that watches Kubernetes directly and streams real-time pod metadata to every client
  • How zone-aware spillover cut cross-availability-zone costs without sacrificing availability
  • Why CPU-based routing failed (monitoring lag creates oscillation) and what signals to use instead

The system has been running in production for three years across hundreds of services, handling millions of requests.

Learn from production

More case studies →

Matching jobs

    • DevOps Engineer with Rhoda ai

    • Salary: $67.5K to $539K a year

    • Location: based in the office in Palo Alto, CA, USA

    • Tech stack: Kubernetes, AWS, Azure, GCP, Go, Python, SQL, Flink, Snowflake, Kafka

    • DevOps Engineer with Smile Digital Health

    • Salary: $67.5K to $539K a year

    • Location: remote from

    • Tech stack: Kubernetes, AWS, Azure, GCP, Java, Python, Scala, Airflow, Spark, Terraform

    • Machine Learning Engineer with Rhoda ai

    • Salary: $57.6K to $550K a year

    • Location: based in the office in Palo Alto, CA, USA

    • Tech stack: Kubernetes, AWS, GCP, Python

    • Machine Learning Engineer with Rhoda ai

    • Salary: $135K to $405.35K a year

    • Location: based in the office in Palo Alto, CA, USA

    • Tech stack: Kubernetes, AWS, Azure, GCP, Python, Shell

    • Machine Learning Engineer with SpAItial

    • Salary: US$161.1K to US$321.2K a year

    • Location: based in the office in London, GB

    • Tech stack: Kubernetes, AWS, Azure, GCP, On-premise, Docker, Python, Terraform, GitHub Actions, CircleCI

Discover more Kubernetes jobs on Kube Careers →

Subscribe to Learn Kubernetes Weekly

Trusted by 77K engineers. Delivered 179 issues and counting.

or subscribe via

Build something

More tutorials →

Call for Papers closing soon

  1. 2

    days

    Open Conf 2026

    The Call For Paper is open until 19 April 2026 at GMT-4. More info →
    • Location: Athens, GR

    • In-person conference organized by Open Conf.

    • The conference starts on the 21 November 2026.

    • Apply here
  2. 4

    days

    SREday Munich 2026

    The Call For Paper is open until 21 April 2026 at GMT-4. More info →
    • Location: Munich, DE

    • In-person conference organized by SREday.

    • The conference starts on the 15 May 2026.

    • Apply here
  3. 4

    days

    CLC26

    The Call For Paper is open until 21 April 2026 at GMT-4. More info →
    • Location: Mannheim, DE

    • In-person conference organized by Rheinwerk Verlag.

    • The conference starts on the 11 November 2026.

    • Apply here
  4. 13

    days

    Tech Fuse Des Moines 2026

    The Call For Paper is open until 30 April 2026 at GMT-4. More info →
    • Location: Des Moines, IA, USA

    • In-person conference organized by Tech Fuse DSM.

    • The conference starts on the 16 October 2026.

    • Apply here
  5. 13

    days

    Devopsdays Graz

    The Call For Paper is open until 30 April 2026 at GMT-4. More info →
    • Location: Graz, AT

    • In-person conference organized by Devopsdays.

    • The conference starts on the 4 September 2026.

    • Apply here
  6. 13

    days

    bit summit 2026

    The Call For Paper is open until 30 April 2026 at GMT-4. More info →
    • Location: Hamburg, DE

    • In-person conference organized by bit summit.

    • The conference starts on the 23 September 2026.

    • Apply here
  7. 13

    days

    IT-Tage

    The Call For Paper is open until 30 April 2026 at GMT-4. More info →
    • Location: Frankfurt, DE

    • In-person conference organized by Alkmene Verlag.

    • The conference starts on the 10 December 2026.

    • Apply here

Thanks to our sponsors who make Kube Today possible

Find out more about being a sponsor →

More articles

Even more articles →