Monitoring with Grafana, Prometheus & Loki


For the upcoming WTF-Model release on Steam, I decided I needed some form of observability and monitoring for my development infrastructure.
The setup I therefore built ensures full visibility across my network, devices, web services, and GitHub CI/CD automation—a critical requirement during the WTF-Model’s deployment and review cycles.

The setup provides real-time (and color-coded) insight into:

  • Internet connectivity and latency
  • System resource usage across devices
  • Website and API health
  • GitHub Actions runner availability for builds and deployments

Everything runs in a self-hosted Docker Compose environment powered by Prometheus and Grafana.

Below you can see the final result — the dashboard displayed in a three-quarter split view alongside my daily calendar on the iPad mini next to my main workstation, serving as a real-time status and information center.

Grafana dashboard displayed on iPad mini beside workstation for real-time monitoring

Architecture

I have this deployed inside a dedicated Docker network, with containers mapped to a consecutive port range (18000–18xxx) for a clean, predictable layout. Docker Compose was chosen for its simplicity and efficiency in single-host environments. I tried for an hour running this with Helm for K8s, but it was clearly overkill and so i decided for a less complex approach.

Secrets to access devices and scrape data are injected via a .env file, keeping the setup portable and secure.
Because i develop on Windows, but later on deploy to production on Linux, i automated deployment and teardown via functionally identical Windows and Linux scripts:

  • bootstrap.sh / bootstrap.ps1 – builds images, initializes volumes, and launches containers
  • cleanup.sh / cleanup.ps1 – stops services, removes containers, and clears caches
Running container overview and dual deploy script output for Linux and Windows environments

Exporters and Metrics

The monitoring relies on a combination of Prometheus exporters — both standard and custom-coded with Python — each containerized and integrated through docker-compose.yml.

Among them:

  • A customized FRITZ!Box exporter providing DSL, PPPoE, and internet connectivity status.
  • A Pi-hole exporter for DNS-level ad-blocking metrics.
  • The Blackbox exporter, performing ICMP and HTTP checks across LAN devices and websites via custom HTML health endpoints.
  • A Python-based IP exporter that determines the public IP, VPN latency, and geolocation using multiple providers (ipapi, ipwhois, ifconfig, ipsb, ipinfo).
  • The Windows and Linux exporter, installed on node.lan, imac.lan, server.lan, and nuc.lan, tracking CPU, RAM, disk, and GPU utilization.
  • A GitHub exporter, monitoring self-hosted GitHub Actions runners used for WTF-Model deployments, indicating online, idle, or busy states.

Prometheus scrapes data from all these sources on fine-tuned intervals to keep updates near real time while maintaining efficiency:
ICMP probes run every second, lightweight metrics every few seconds, and FRITZ!Box data every 10 seconds - that only to due to device API limitations.
I believe this balance ensures fast, continuous visibility without adding unnecessary load.

Prometheus and Docker Compose configuration in VSCode showing exporter setup

Dashboard Layout

I organized the Grafana dashboard into two main columns for quick, high-level insights:

Left Column – Connection

  • FRITZ!Box DSL / PPPoE / Internet states
  • VPN connection, public IP, and latency
  • WAN traffic rates and daily totals
  • DNS latency (internal vs external)
  • Website health checks and Pi-hole statistics

Right Column – Devices

  • CPU, RAM, Disk, and GPU metrics for each major device
  • Online status for core network components
  • GitHub Actions runner states used for WTF-Model builds and deployments

Annotated Grafana dashboard layout highlighting connection and device monitoring sections

Log Aggregation with Loki & Promtail

To extend observability beyond metrics, I later on decided to integrate Loki and Promtail into the stack: Loki acts as the log aggregation service, continuously ingesting logs from defined sources and exposing them to Grafana via its native query language.

While Grafana can connect directly to Loki for real-time log visualization, I chose to include Promtail as an additional mediator: Promtail parses and transforms Loki log streams into Prometheus-compatible metrics, enabling the display of aggregated info, debug, warn, and error entries across all containers over time — making it easy to spot anomalies.
In addition, I decided to parse basic Docker metrics to determine container health; ensuring the dashboard provides at all times a clear indicator of its actual reliability.

Promtail-generated metrics visualized in Grafana

I believe that the result is a compact yet comprehensive monitoring environment which provides immediate visibility into all systems supporting the WTF-Model and my daily workflow.

comments powered by Disqus