Spotlight

Building a Production-Grade Observability Platform with SigNoz, ClickHouse, and OpenTelemetry

Shivee Gupta

This article explains how Dream11 built an in-house observability platform using SigNoz, ClickHouse, and OpenTelemetry to handle millions of metrics and traces across thousands of EC2 instances, saving millions in commercial tooling costs.

More articles →

Tools and utilities

  • Dynamo: distributed LLM inference

    NVIDIA Dynamo is a datacenter-scale distributed LLM inference framework supporting disaggregated prefill/decode, KV-aware routing, and dynamic GPU scheduling across vLLM, SGLang, and TensorRT-LLM.

  • Helm CEL Validator

    Helm CEL is a Helm plugin that leverages Common Expression Language (CEL) to validate values, offering a more expressive and flexible alternative to traditional JSON Schema validation.

  • Crossview: Crossplane UI

    Crossview is a React-based dashboard for managing and monitoring Crossplane resources in Kubernetes with features like:

  • k8s-bootstrap: GitOps cluster bootstrap

    k8s-bootstrap generates GitOps-ready Kubernetes cluster configurations through a web UI where you select components such as ingress controllers, security tools, and observability platforms.

  • KAO: K8s Agent Orchestration System

    KAOS is a Kubernetes-native framework for deploying and orchestrating AI agents with MCP tool integration, multi-agent coordination, hierarchical delegation, OpenAI-compatible endpoints, and a visual dashboard for monitoring and debugging.

More projects →

Events starting soon

Discover more events onn Kube Events →

How We Cut Build Debugging Time by 75% with AI
How We Cut Build Debugging Time by 75% with AI

Build failures in Kubernetes CI/CD pipelines are a silent productivity killer. Developers spend 45+ minutes scrolling through cryptic logs, often just hitting rerun and hoping for the best.

Ron Matsliah, DevOps engineer at Next Insurance, built an AI-powered assistant that cut build debugging time by 75% — not as a dashboard, but delivered directly in Slack where developers already work.

In this episode:

  • Why combining deterministic rules with AI produces better results than letting an LLM guess alone
  • How correlating Kubernetes events with build logs catches spot instance terminations that produce misleading errors
  • Why integrating into existing workflows and building feedback loops from day one drove adoption
  • The prompt engineering lessons learned from testing with real production data instead of synthetic examples

The takeaway: simple rules plus rich context consistently outperform complex AI queries on their own.

Learn from production

More case studies →

Matching jobs

    • Chief Technology Officer with Hyphen Connect Limited

    • Salary: $135K to $440K a year

    • Location: remote from

    • Tech stack: Kubernetes, GCP, Docker, Go, Javascript, Kotlin, Python, Rust, Swift, Typescript

    • Data Engineer with 1upHealth

    • Salary: $54K to $319K a year

    • Location: remote from

    • Tech stack: Kubernetes, AWS, Docker, Python, Scala, SQL, DynamoDB, Snowflake, Kafka, Terraform

    • Data Engineer with 42dot

    • Salary: $98.01K to $465.3K a year

    • Location: based in the office in Pangyo, KR

    • Tech stack: Kubernetes, Python, Flink, Spark, Terraform

    • Data Engineer with AlayaCare

    • Salary: US$157.5K to US$302.5K a year

    • Location: based in the office (and remote from home) in Montréal, CA

    • Tech stack: Kubernetes, AWS, Docker, SQL, PHP, Python, Flink, Snowflake, Kafka, Spark

Discover more Kubernetes jobs on Kube Careers →

Subscribe to Learn Kubernetes Weekly

Trusted by 77K engineers. Delivered 175 issues and counting.

or subscribe via

Build something

More tutorials →

Call for Papers closing soon

  1. 2

    days

    Cloud Native Days Amsterdam

    The Call For Paper is open until 20 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by Cloud Native Amsterdam.

    • The conference starts on the 22 May 2026.

    • Apply here
  2. 5

    days

    Cloud Native Telco Day Europe

    The Call For Paper is open until 23 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by CNCF.

    • The conference starts on the 23 March 2026.

    • Apply here
  3. 5

    days

    Cloud Native AI + Kubeflow Day Europe

    The Call For Paper is open until 23 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by CNCF.

    • The conference starts on the 23 March 2026.

    • Apply here
  4. 5

    days

    Cloud Native 2026

    The Call For Paper is open until 23 March 2026 at GMT-4. More info →
    • This is a virtual event

    • Online conference organized by Conf42.

    • The conference starts on the 23 April 2026.

    • Apply here
  5. 7

    days

    Kubernetes Community Days New York 2026

    The Call For Paper is open until 25 March 2026 at GMT-4. More info →
    • Location: New York, NY, USA

    • In-person conference organized by KCD New York.

    • The conference starts on the 10 June 2026.

    • Apply here
  6. 8

    days

    Data on Kubernetes Day

    The Call For Paper is open until 26 March 2026 at GMT-4. More info →
    • Location: Amsterdam, NL

    • In-person conference organized by CNCF.

    • The conference starts on the 26 March 2026.

    • Apply here
  7. 9

    days

    DeveloperWeek New York 2026

    The Call For Paper is open until 27 March 2026 at GMT-4. More info →
    • Location: New York, NY, USA

    • In-person conference organized by DeveloperWeek New York.

    • The conference starts on the 10 June 2026.

    • Apply here

Thanks to our sponsors who make Kube Today possible

Find out more about being a sponsor →

More articles

Even more articles →