Monitoring with Prometheus & Grafana
For the upcoming WTF-Model release on Steam, I decided I needed some form of observability and monitoring for my development infrastructure.
The setup I therefore built ensures full visibility across my network, devices, web services, and GitHub CI/CD automation—a critical requirement during the WTF-Model’s deployment and review cycles.
The setup provides real-time (and color-coded) insight into:
- Internet connectivity and latency
- System resource usage across devices
- Website and API health
- GitHub Actions runner availability for builds and deployments
Everything runs in a self-hosted Docker Compose environment powered by Prometheus and Grafana.
Below you can see the final result — the dashboard displayed in a three-quarter split view alongside my daily calendar on the iPad mini next to my main workstation, serving as a real-time status and information center.

Architecture
I have this deployed inside a dedicated Docker network, with containers mapped to a consecutive port range (18000–18xxx
) for a clean, predictable layout. Docker Compose was chosen for its simplicity and efficiency in single-host environments. I tried for an hour running this with Helm for K8s, but it was clearly overkill and so i decided for a less complex approach.
Secrets to access devices and scrape data are injected via a .env file, keeping the setup portable and secure.
I automated deployment and teardown via functionally identical Windows and Linux scripts:
bootstrap.sh
/bootstrap.ps1
– builds images, initializes volumes, and launches containerscleanup.sh
/cleanup.ps1
– stops services, removes containers, and clears caches

Exporters and Metrics
The monitoring relies on a combination of Prometheus exporters — both standard and custom-coded with Python
— each containerized and integrated through docker-compose.yml
.
Among them:
- A customized FRITZ!Box exporter providing DSL, PPPoE, and internet connectivity status.
- A Pi-hole exporter for DNS-level ad-blocking metrics.
- The Blackbox exporter, performing ICMP and HTTP checks across LAN devices and websites via custom HTML health endpoints.
- A Python-based IP exporter that determines the public IP, VPN latency, and geolocation using multiple providers (
ipapi
,ipwhois
,ifconfig
,ipsb
,ipinfo
). - The Windows and Linux exporter, installed on
node.lan
,imac.lan
,server.lan
, andnuc.lan
, tracking CPU, RAM, disk, and GPU utilization. - A GitHub exporter, monitoring self-hosted GitHub Actions runners used for WTF-Model deployments, indicating online, idle, or busy states.
Prometheus scrapes data from all these sources on fine-tuned intervals to keep updates near real time while maintaining efficiency:
ICMP probes run every second, lightweight metrics every few seconds, and FRITZ!Box data every 10 seconds - that only to due to device API limitations.
I believe this balance ensures fast, continuous visibility without adding unnecessary load.

Dashboard Layout
I organized the Grafana dashboard into two main columns for quick, high-level insights:
Left Column – Connection
- FRITZ!Box DSL / PPPoE / Internet states
- VPN connection, public IP, and latency
- WAN traffic rates and daily totals
- DNS latency (internal vs external)
- Website health checks and Pi-hole statistics
Right Column – Devices
- CPU, RAM, Disk, and GPU metrics for each major device
- Online status for core network components
- GitHub Actions runner states used for WTF-Model builds and deployments

I believe that the result is a compact yet comprehensive monitoring environment which provides immediate visibility into all systems supporting the WTF-Model and my daily workflow.