I run a baremetal Kubernetes cluster on a couple raspberry pis (though that detail isn't super important to this question). I am familiar with Kubernetes metrics/alerting tools such as grafana, Prometheus, Loki, ELK stack, etc. I am also familiar with the node metrics exporter for gathering node level resource metrics like CPU, memory, file system, temps, etc. All that's great and gets me like 99% of the way there. The last 1% that I am looking for are things like available updates (e.g. 56 packages with available updates), reboot required, system component status, etc and for whatever reason I sttuggle to find good search results for this specific problem area.
I can and do use things like dnf-automatic/unattended-upgrades and systemd to maintain the minimal system level health (so 99% -> 99.8%) but I haven't been able to find a solution that provides a bit more insight depth into underlying system health, probably because that's usually handled by cloud providers/hypervisors. I am sure I could come up with some custom, not too hacky solution for myself (off the top of my head: a pod/job with access the underlying system to run whatever commands I want to gather state and make it available to the Kube space general monitoring solution, feels dirty though) but it feels like an obvious hole I'm just missing the wrong Google incantation to find.
Any ideas or experience you can provide? Please don't suggest kube metrics node-exporter, unless I am missing something it doesn't provide what I am asking about.
Knowing what and when to abstract can be hard to define precisely. Over abstraction has a cost. So does under abstraction. I have seen, writen and refactored terrible examples of both. Anecdotally, flattening an over abstracted hierarchy feels like less work and usually has better test coverage to validate correctness after refactoring than abstracting under abstracted code (spaghetti code, linear code, brand it how you will). Be aware of both extremes and try to find the balance.