Spotlight
M Sandy H
This tutorial teaches how to build a local observability stack for AWS EKS logs using Stern for multi-pod tailing, Fluent Bit for log processing with multiline parsing, and Elasticsearch with Kibana for searchable visualization via Docker Compose.
Pavel Buchnev
This article teaches how to build self-evolving AI systems using Kubernetes, Temporal workflows, and automated deployment pipelines, enabling AI agents to detect errors, fix code, and redeploy services without manual intervention.
Harichandana Kotha
This tutorial shows how to serve open source LLMs on Red Hat OpenShift AI with KServe, vLLM, Argo CD, and IBM Fusion using a fully declarative GitOps workflow.
Asmaa Elalfy
This tutorial shows how to build a private EKS cluster with zero public API exposure using Terraform.
It also covers self-hosted OpenVPN as a VPN gateway, NAT masquerade iptables setup, kube-prometheus-stack via internal load balancer, and Route 53.
Tools and utilities
EgressGateway provides stable egress IP addresses for pods accessing external services.
git-change-operator is a Kubernetes operator that enables automated Git operations from within clusters through GitCommit and PullRequest custom resources.
Orphan Resource Collector (ORC) is an open-source tool for detecting and managing orphaned resources in Kubernetes clusters.
Goldpinger is a monitoring tool that runs as a DaemonSet and makes calls between pod instances to test connectivity.
Kube Binpacking Exporter exposes Prometheus metrics that show how efficiently your cluster packs requested CPU and memory across nodes, groups, and DaemonSet overhead so you can measure fragmentation over time.
Events starting soon
July 4, 2026
Location: Kraków, PL
This event requires an entrance fee
July 9, 2026
This is a virtual event
This is a free event.
July 9, 2026
Location: Freiburg im Breisgau, DE
This is a free event.
July 9, 2026
Location: Dublin, CA, USA
This is a free event.
July 15, 2026
Location: Sydney, AU and virtual
This is a free event.
July 15, 2026
This is a virtual event
This is a free event.
Learn from production
In this blog post, the author tracks down persistent sandbox-cleanup errors in a Kubernetes cluster, finds that zero-length CNI cache files cause the problem, and shows how manually deleting those files cleared the error.
This blog post tells how the Render team:
Jack Lindamood
This case study shows how OOM Killer terminated a critical network daemon on Kubernetes nodes, causing a network outage.
It covers debugging via serial console and implementing memory reservations to prevent system-critical process termination.
Nick Roan
This case study shows how a single RAG chunk size change collapsed vLLM prefix-cache hit rate from 85% to 4%, triggering an 80% GPU replica increase while latency stayed flat.
It also includes the fix: adding a two-phase cache replay gate in CI.
Matching jobs
Data Engineer with NinjaTrader
Salary: $100K to $150K a year
Location: remote from
Tech stack: Kubernetes, AWS, GCP, Python, SQL, Flink, Kafka, Airflow, Terraform, Datadog
DevSecOps Engineer with Blueprint
Salary: $95K to $105K a year
Location: based in the office (and remote from home) in Camp Springs, MD, USA
Tech stack: Kubernetes, AWS, OpenShift, ArgoCD, Docker, Shell, Terraform, GitHub Actions, Jenkins
Engineering Manager with New Relic
Salary: $112.5K to $473K a year
Location: remote from
Tech stack: Kubernetes, OpenShift, Rancher, Helm, Go, Python, Prometheus
Platform Engineer with Nubank
Salary: $13.82K to $259.6K a year
Location: remote from
Tech stack: Kubernetes, AWS, Kafka
Solution Engineer with New Relic
Salary: US$94.5K to US$253K a year
Location: based in the office in Dublin, IE
Tech stack: Kubernetes, AWS, Azure, Java, Python, Ruby, SQL
Build something
Vitalii Ruzhnikov
This tutorial shows how to expose a self-hosted Kubernetes cluster on Proxmox using a dual HAProxy setup one on the host as an edge gateway and one as an in-cluster ingress controller.
Serhan Ekici
This tutorial shows how to deploy OpenClaw on Kubernetes with a Helm chart and ArgoCD, using persistent storage, config modes, secrets handling, and network policies to reduce the blast radius of an AI agent.
This tutorial teaches how to extend EKS with hybrid nodes using IAM Roles Anywhere and HashiCorp Vault for secure authentication of on-premises or edge workloads.
DV Engineering
This tutorial teaches how to collect Prometheus metrics from Kubernetes clusters and securely route them to remote Prometheus instances using Vector with mTLS encryption.
More articles
Florian Lettner
This article explains how building a k3s media server with Claude Code exposed both the speed and the limits of AI-first engineering across GitOps, observability, storage tuning, and Kubernetes debugging.
Sergey Goncharov
This case study walks through a real debugging story on EKS Fargate where missing a DHCP option set caused silent DNS failures and pods stuck in pending — and how to find and fix it.
Aslanov Javid
This article shows why Grafana becomes slow on Kubernetes when multiple replicas share SQLite over EFS, and explains why a single replica on block storage or a real external database is the correct fix.
Btech Engineering
This article explains how a team deployed Ansible AWX on K3s and extended it for OpenStack inventory, dynamic SSH users, execution nodes, custom execution environments, and air-gapped installs.