Spotlight

A tale of expired IAM credentials

Fabián Sellés Rosa

This case study shows how upgrading to Kubernetes 1.34 caused KIAM pods to fail due to service account token expiration changes, revealing that legacy clients using long-lived tokens now expire after 24 hours instead of 90 days.

More articles →

Tools and utilities

  • OpenEBS

    OpenEBS is a modern Block-Mode storage platform, a Hyper-Converged Software Storage System, and a virtual NVMe-oF SAN (vSAN) Fabric that is natively integrated into Kubernetes' core.

  • AgentDiscover Scanner: AI agent detection

    AgentDiscover Scanner detects autonomous AI agents and Shadow AI in codebases using static analysis for Python and JavaScript, network monitoring for active LLM traffic, and Kubernetes runtime detection via Cilium Tetragon eBPF.

  • OpenKruise Agents: AI agent sandbox

    OpenKruise Agents manage AI agent workloads in Kubernetes, providing rapid resource provisioning via pooling, sandbox hibernation with checkpoint support, and user session management with efficient traffic routing.

  • Gonzo: TUI log analysis

    Gonzo lets you use a terminal UI to stream and analyse logs in real time, with support for OpenTelemetry (OTLP), AI-powered insights, heatmaps and advanced filtering.

  • Helm chartsnap: testing charts

    helm-chartsnap is a tool that provides powerful UI testing capabilities for Helm charts with minimal configuration just within values.yaml files.

More projects →

Events starting soon

Discover more events onn Kube Events →

GPU Containers as a Service
GPU Containers as a Service

Running GPU workloads on Kubernetes sounds straightforward until you need to isolate multiple tenants on the same server. The moment you virtualize GPUs for security, you lose access to NVIDIA kernel drivers — and almost every tool in the ecosystem assumes those drivers exist.

Landon Clipp built a GPU-based Containers as a Service platform from scratch, solving each isolation layer — from kernel separation with Kata Containers + QEMU to NVLink fabric partitioning to network policies with Cilium/eBPF — and shares exactly what broke along the way.

In this interview:

  • Why standard NVIDIA tooling (GPU Operator) fails in multi-tenant setups, and how to use CDI with PCI topology scanning to make GPUs visible to Kubernetes without kernel drivers
  • How to partition the NVLink fabric between tenants using a trusted service VM running Fabric Manager, and why the physical PCIe wiring differs between Supermicro HGX and NVIDIA DGX systems
  • Why gVisor doesn't work for GPU workloads — NVIDIA's unstable ioctl ABI means Google has to update gVisor for every driver release, and they only support a handful of GPUs
  • What caused 8-GPU VMs to take 30+ minutes to boot, and the specific fixes (IOMMUFD, cold plugging, kernel upgrades) that brought it down to minutes
  • How Cilium network policies enforce tenant isolation at the Kubernetes identity level instead of fragile IP-based rules

Where Containers as a Service fits best: inference workloads where AI teams want to ship an OCI image without managing infrastructure or signing multi-million dollar cluster contracts.

Learn from production

More case studies →

Matching jobs

    • Data Engineer with ClearPoint

    • Salary: US$72K to US$286K a year

    • Location: based in the office (and remote from home) in Auckland, NZ

    • Tech stack: Kubernetes, AWS, Python, SQL, Kafka

    • Data Engineer with Wynd Labs

    • Salary: $62.69K to $215.6K a year

    • Location: fully remote

    • Tech stack: Kubernetes, AWS, Docker, Java, Python, Scala, SQL, Snowflake, Cloudformation, Terraform

    • DevOps Engineer with BLP Digital AG

    • Salary: US$135K to US$275K a year

    • Location: based in the office in Zurich, CH

    • Tech stack: Kubernetes, AWS, Azure, GCP, Docker

    • DevOps Engineer with Bland

    • Salary: $120K to $200K a year

    • Location: based in the office in San Francisco, CA, USA

    • Tech stack: Kubernetes, AWS, GCP, Docker, Go, Typescript, Terraform, Datadog

    • DevOps Engineer with Teraswitch

    • Salary: $49.5K to $539K a year

    • Location: based in the office in Pittsburgh, PA, USA

    • Tech stack: Kubernetes, Python, MySQL, Ceph, Ansible, Grafana, Prometheus

Discover more Kubernetes jobs on Kube Careers →

Subscribe to Learn Kubernetes Weekly

Trusted by 77K engineers. Delivered 176 issues and counting.

or subscribe via

Build something

More tutorials →

Call for Papers closing soon

  1. 1

    days

    DeveloperWeek New York 2026

    The Call For Paper is open until 27 March 2026 at GMT-4. More info →
    • Location: New York, NY, USA

    • In-person conference organized by DeveloperWeek New York.

    • The conference starts on the 10 June 2026.

    • Apply here
  2. 3

    days

    Devopsdays Amsterdam

    The Call For Paper is open until 29 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by Devopsdays.

    • The conference starts on the 19 June 2026.

    • Apply here
  3. 3

    days

    KubeCon + CloudNativeCon Japan 2026

    The Call For Paper is open until 29 March 2026 at GMT-4. More info →
    • Location: Yokohama, JP

    • In-person conference organized by Linux Foundation.

    • The conference starts on the 30 July 2026.

    • Apply here
  4. 5

    days

    WeAreDevelopers World Congress 2026 North America

    The Call For Paper is open until 31 March 2026 at GMT-4. More info →
    • Location: San Jose, CA, USA

    • In-person conference organized by WeAreDevelopers.

    • The conference starts on the 25 September 2026.

    • Apply here
  5. 5

    days

    J On the Beach

    The Call For Paper is open until 31 March 2026 at GMT-4. More info →
    • Location: Malaga, ES and virtual

    • Online & in-person conference organized by Yay Yay Events.

    • The conference starts on the 29 October 2026.

    • Apply here
  6. 5

    days

    Øredev

    The Call For Paper is open until 31 March 2026 at GMT-4. More info →
    • Location: MALMÖ, SE

    • In-person conference organized by Øredev.

    • The conference starts on the 4 November 2026.

    • Apply here
  7. 5

    days

    Cloud Native Summit Munich 2026

    The Call For Paper is open until 31 March 2026 at GMT-4. More info →
    • Location: Munich, DE

    • In-person conference organized by Cloud Native Summit Munich.

    • The conference starts on the 30 June 2026.

    • Apply here

Thanks to our sponsors who make Kube Today possible

Find out more about being a sponsor →

More articles

Even more articles →