Creating a company's first monitoring system is like giving sight to the blind. It’s a game-changer for catching issues early and resolving them fast. But the journey can be tedious, full of unexpected roadblocks and learning curves.
At Hasan's IT Solutions, we recently implemented our first observability stack using Prometheus, Grafana, and AlertManager. Here are the lessons we wish we’d known before we began — insights to save your time, avoid common pitfalls, and help you build a powerful, meaningful monitoring solution.
1. Custom Metrics = Full Control, But Manual Work
Prometheus gives you complete flexibility in choosing what to monitor. But that power comes with responsibility: you decide what metrics to write, where to insert them, and how to interpret them.
Writing custom metrics required understanding every flow in the codebase, gathering insights from stakeholders, and placing counters, histograms, and gauges in the right spots. It was labor-intensive, but the observability was tailored exactly to our needs.
With AI-assisted coding tools like Cursor or Claude-Code, that task could be easier today — though you'd still need to validate what’s generated. Manual doesn't mean bad — just slow and error-prone.
2. Learn the Best Practices Early
Before you log your first metric, study best practices for Prometheus. It could reshape your approach and prevent long-term problems.
We learned (the hard way) that high-cardinality labels like session_id
or user_id
can severely degrade performance and inflate database usage. If you're planning deep tracking for specific users or entities, consider tools beyond Prometheus, like OpenTelemetry or a trace-based service.
3. Don’t Trust the Green Lights — Validate Everything
Just because metrics show up doesn’t mean everything’s working. We ran into an issue where metrics silently disappeared. Turns out, our FastAPI app on Gunicorn used multiple processes — but Prometheus’ default behavior only tracks one worker unless configured correctly.
Always verify that your metrics reflect reality, not just that endpoints exist. Quiet failures are the worst kind.
4. Treat Observability Like a Product
There’s rarely a dedicated PM for observability. But there are users: developers, SREs, and ops teams. Each has different needs and ideas of what success looks like.
We set a definition of done, collected feedback, and iterated. Dashboards became cleaner, alerts became more meaningful, and the system matured.
Conclusion
At Hasan’s IT Solutions, building observability was one of our most rewarding — and humbling — challenges. It improved our reliability, but more importantly, it changed our culture around debugging and accountability.
Prometheus is powerful, but it’s not simple. Start with clear goals, validate aggressively, and always plan for iteration.
Your Turn
Have you built your own monitoring system? What did you learn the hard way? Let us know — we’d love to hear your stories and tips!