Monitoring cluster and applications (WIP)
How to monitor the cluster and it's applications
This is simply some placeholder text that we can fill out later.
This guide should show:
- How to use Grafana dashboards to check important metrics and logs. Such as traffic, alerts etc. This can be used to i.e pre-actively prevent downtime/slowness, intrusion detection, used/available storage/CPU/RAM, cost per customer, overview of outdated packages, application profiling, crashed/crashing applications and more.
- How we aggregate and parse logs, metrics and tracing.
- How to monitor infrastructure applications vs. databases vs. applications.
- How to monitor deployment state with Flux through CLI, Matrix and Git.
- How to monitor errors and traffic on frontend using Sentry, open telemetry and plausible (GDPR friendly alternative to Google Analytics)
- How to use traces to debug an application, this can be very useful to see the flow of traffic from something that we've built ourselves or for more see complex issues which might be caused by several "hops" between internal and external applications.
- Generally find errors in the cluster, how alerting works and how we can extend our alert and uptime monitors.