π Monitoring Stack Overview
Welcome to the monitoring stack! This robust, Kubernetes-native setup gives us absolute, X-ray visibility into our cluster's health, infrastructure performance, and application behavior. π
We leverage a powerful suite of open-source tools to handle metrics, logs, and traces seamlesslyβdelivering a unified observability experience!
ποΈ Architecture Overviewβ
Here is a high-level look at how telemetry data flows through our ecosystem:
- π Metrics: Your Pods and specialized Exporters expose metrics. Prometheus aggressively scrapes this data, feeds it to Grafana for beautiful visualization, and triggers crucial alerts via Alertmanager.
- πΈοΈ Traces: Applications blast traces to the OpenTelemetry Collector (
infra-otel). It smartly routes and forwards them to Tempo (safely backed by S3) for deep querying in Grafana. - πͺ΅ Logs: Application logs are snatched by Grafana Alloy (either via pod labels or direct network push) and shipped to Loki (backed by S3) for lightning-fast searching.

π General Guidelines & Killer Featuresβ
- π€ AI-Powered Dashboards (Grace Period): We have wired an LLM into Grafana! You can command the AI to auto-generate and modify dashboards for you.
β οΈ Important: During this grace period, the AI is strictly sandboxed. It can only create and modify dashboards inside theAIfolder. Keep our core dashboards safe! - π Alert Routing: Alertmanager pushes notifications to Slack, classified chats, and email based on global configs managed by the infra team. Need a new route? Just ask! (Note: We are migrating to native Grafana Alerts soon!)
π Service Documentationβ
Dive deeper into any piece of the stack using these guides:
- π Grafana
- π Prometheus
- β±οΈ Tempo
- π Loki
- π‘ OpenTelemetry Collector
- πͺ΅ Grafana Alloy
- π¨ Alertmanager
π Exportersβ
Want to know exactly how we gather metrics from databases or the Kubernetes API? Check out the Exporters Directory! π