Spotlight

Hidden Kubernetes Bad Practices Learned the Hard Way During Incidents

Mariem Sboui

This article shares hidden Kubernetes bad practices discovered through production incidents covering troubleshooting challenges, configuration mistakes, and operational lessons learned by a Site Reliability Engineer.

More articles →

Tools and utilities

  • Karpenter Optimizer: cost optimization

    karpenter-optimizer analyzes your Kubernetes cluster usage in real-time and gives you AI-powered recommendations to reduce AWS EC2 costs.

  • Linnix: eBPF Observability & AI Incident Detection for Kubernetes

    Linnix is an eBPF + PSI-powered Kubernetes observability agent written in Rust that identifies which pod is actually stalling your services, not just consuming CPU.

  • kubesdk: Kubernetes SDK

    kubesdk is a fully typed, async-first Python Kubernetes client with a CLI that generates models from any live cluster or CRD, achieving over 1000 RPS on large, multi-cluster workloads.

  • Sgl-project/rbg: AI inference orchestrator

    RoleBasedGroup is a Kubernetes API written in Go for orchestrating distributed stateful AI inference workloads with multi role collaboration and built in service discovery, treating inference services as role based groups rather than isolated workloads.

  • GoKubeDownscaler: workload autoscaler

    GoKubeDownscaler is a horizontal autoscaler for Kubernetes workloads written in Go that automatically scales down deployments, statefulsets, and other resources based on time schedules to save costs.

More projects →

Events starting soon

Discover more events onn Kube Events →

Migrating to Karpenter: Fun Stories
Migrating to Karpenter: Fun Stories

Running multiple Kubernetes clusters on AWS with the cluster autoscaler? Every four months, you face the same grind: upgrading Kubernetes versions, recreating auto scaling groups, and hoping instance type changes stick.

Adhi Sutandi, DevOps Engineer at Beekeeper by LumApps, shares how his team migrated from the cluster autoscaler to Karpenter across eight EKS clusters — and the hard lessons they learned along the way.

In this episode:

  • Why AWS auto scaling groups are immutable and how that creates upgrade bottlenecks at scale
  • How the latest AMI tag accidentally turned less critical clusters into chaos engineering environments, dropping SLOs before anyone realized Karpenter was the cause
  • Why pre-stop sleep hooks solved pod restartability problems that Quarkus's built-in graceful shutdown couldn't
  • The case for pod disruption budgets over Karpenter annotations when protecting critical workloads during node rotations
  • How Karpenter's implicit 10% disruption budget caught the team off guard — and the explicit configuration that fixed it

Learn from production

More case studies →

Matching jobs

    • DevOps Engineer with Planet

    • Salary: $14.28M to $20.32M a year

    • Location: remote from

    • Tech stack: Kubernetes, GCP, SQL, Python, Javascript, Go, Shell, Terraform, Grafana

    • DevOps Engineer with Precision Medicine Group

    • Salary: $147.6K to $324.28K a year

    • Location: fully remote

    • Tech stack: Kubernetes, AWS, Helm, Docker, Python, Shell, Terraform, Gitlab, AWS CloudWatch

    • DevSecOps Engineer with Pinterest

    • Salary: $155.58K to $320.32K a year

    • Location: remote from

    • Tech stack: Kubernetes, AWS, Go, Python, C++, Typescript, Terraform, Puppet

    • DevSecOps Engineer with Rise8

    • Salary: $163.12K to $203.9K a year

    • Location: remote from

    • Tech stack: Kubernetes, Shell, Python, Powershell, Terraform, Jenkins, Ansible, Puppet, Chef

    • DevSecOps Engineer with Schonfeld

    • Salary: $120K to $135K a year

    • Location: fully remote

    • Tech stack: Kubernetes, Python, Powershell

Discover more Kubernetes jobs on Kube Careers →

Subscribe to Learn Kubernetes Weekly

Trusted by 77K engineers. Delivered 173 issues and counting.

or subscribe via

Build something

More tutorials →

Call for Papers closing soon

  1. 7

    days

    SREday San Francisco 2026

    The Call For Paper is open until 16 March 2026 at GMT-4. More info →
    • Location: San Francisco, CA, USA

    • In-person conference organized by SREday.

    • The conference starts on the 15 April 2026.

    • Apply here
  2. 7

    days

    SREday Seattle 2026

    The Call For Paper is open until 16 March 2026 at GMT-4. More info →
    • Location: Seattle, WA, USA

    • In-person conference organized by SREday.

    • The conference starts on the 20 April 2026.

    • Apply here
  3. 11

    days

    Cloud Native Days Amsterdam

    The Call For Paper is open until 20 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by Cloud Native Amsterdam.

    • The conference starts on the 22 May 2026.

    • Apply here
  4. 13

    days

    Cloud Native Telco Day Europe

    The Call For Paper is open until 23 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by CNCF.

    • The conference starts on the 23 March 2026.

    • Apply here
  5. 13

    days

    Cloud Native AI + Kubeflow Day Europe

    The Call For Paper is open until 23 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by CNCF.

    • The conference starts on the 23 March 2026.

    • Apply here
  6. 14

    days

    Cloud Native 2026

    The Call For Paper is open until 23 March 2026 at GMT-4. More info →
    • This is a virtual event

    • Online conference organized by Conf42.

    • The conference starts on the 23 April 2026.

    • Apply here
  7. 16

    days

    Data on Kubernetes Day

    The Call For Paper is open until 26 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by CNCF.

    • The conference starts on the 26 March 2026.

    • Apply here

Thanks to our sponsors who make Kube Today possible

Find out more about being a sponsor →

More articles

Even more articles →