Hi! i have a mixed set of containers (a few, not too many) and bare-metal services (quite a few) and i would like to monitor them.
I am using good old “monit” that monitors my network interfaces, filesystems status and traditional services (via pid files). It’s not pretty, but get the work done. It seems i cannot find a way to have it also monitor my containers. Consider that i use podman and have a strict one service, one user policy (all containers are rootless).
I also run “netdata” but i find it overwhelming, too much data, too much graphics, just too much for my needs.
I need something that:
- let me monitor service status
- let me monitor containers status
- let me restart services or containers (not mandatory, but preferred)
- has a nice web GUI
- the web gui is also mobile friendly (not mandatory, but appreciated)
- Can print some history data (not manatory, but interesting)
- Can monitor CPU usage (mandatory)
- Can monitor filesystem usage (mandatory)
I don’t care for authentication features, since it will be behind a reverse proxy with HTTPS and proxy authentication already.
I am not looking for a fancy and comples dashboard, but for something i can host on a secondary page that i open if/when i want to check stuff. Also, if the tool can be scripted or accessed via an API could be useful, so i would write some extractors to print something in a summary page in my own dashboard.
grafana is pretty annoying to learn and setup but it does everything you seem to want.
I think Prometheus is a good industry standard. It can do everything you listed except for restarting stuff. It’s got a decent built-in monitoring capability and you can extend it trivially to monitor anything. For example I wrote a 5-liner to monitor ZFS health and another for LVM. I even monitor my routers with it. OpenWrt has an installable node exporter for Prometheus.
Service restarting is a remote execution capability and generally falls outside of the monitoring domain. You’d be better off implementing that with another process/service manager. If you’re running systemd, that’s one of its primary purposes. You can use it to start/stop/restart containers just like normal processes.
good ol’ nagios (or one of its forks)
Give https://github.com/louislam/uptime-kuma a try. I’m planning to do the same for similar use case. Sensu (sensu.io) is a more sophisticated option but it requires more infrastructure and there is a bit of a learning curve with it.
While I really like uptime kuma, it seems a bit too restricted for OPs use case. For example, to monitor disk or CPU usage, you would need to write your own scripts. It would be doable, but not very nice.
At least how I understood the.question, OP would probably look for something like icinga.
Yeah better fit but a bit of trouble to setup… What’s your opinion on Icinga? Never used it myself.
We had it at work, but I never did anything else than receiving and resolving alerts. But it looked good for me and I liked the system.