I'm curious to see what others use, but I've been using Uptime Kuma for this purpose at several different sites I manage. I run Uptime Kuma on a VPS and locally on site. I have the local instance monitoring the router, gateway, DNS, and several other internal and external devices. I also have the local instance do a "push" check-in on my VPS instance. This gives me a pretty holistic view of things.
Self Hosted - Self-hosting your services.
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules
- No harassment
- crossposts from c/Open Source & c/docker & related may be allowed, depending on context
- Video Promoting is allowed if is within the topic.
- No spamming.
- Stay friendly.
- Follow the lemmy.ml instance rules.
- Tag your post. (Read under)
Important
Beginning of January 1st 2024 this rule WILL be enforced. Posts that are not tagged will be warned and if not fixed within 24h then removed!
- Lemmy doesn't have tags yet, so mark it with [Question], [Help], [Project], [Other], [Promoting] or other you may think is appropriate.
Cross-posting
- [email protected] is allowed!
- [email protected] is allowed!
- [email protected] is allowed!
- [email protected] is allowed if topic has to do with selfhosting.
- [email protected] is allowed!
If you see a rule-breaker please DM the mods!
Interesting. Might try that. Thank you.
Depends on how much you want to set up. For my purposes, I just check for connectivity every minute, and record true or false as a new row in a sqlite database if there is connectivity.
This is what I use on my raspberry pi,
#!/usr/bin/env python3
from datetime import datetime
import sqlite3
import socket
from pathlib import Path
try:
host = socket.gethostbyname("one.one.one.one")
s = socket.create_connection((host, 80), 2)
s.close()
connected = True
except:
connected = False
timestamp = datetime.now().isoformat()
db_file = Path(__file__).resolve().parent / 'Database.sqlite3'
conn = sqlite3.connect(db_file)
curs = conn.cursor()
curs.execute('''CREATE TABLE IF NOT EXISTS checks (id INTEGER PRIMARY KEY AUTOINCREMENT, timestamp TEXT, connected INTEGER)>
curs.execute('''INSERT INTO checks (timestamp, connected) VALUES (?, ?);''', (timestamp, 1 if connected else 0))
conn.commit()
conn.close()
and I just have a crontab entry * * * * * ~/connectivity_check/check.py >/dev/null 2>&1
to run it every minute.
Then I just check for recent disconnects via:
$ sqlite3 ./connectivity_check/Database.sqlite3 'select count(*) from checks where connected = 0 order by timestamp desc;'
Obviously, it's not as full-featured as something with configurable options or a web UI etc, but for my purposes, it does exactly what I need with absolutely zero overhead.
That's not exactly the solution I was looking for, but that was very instructive. Thank you.
Ah I see you mentioned the cuts are only a few seconds long. This wouldn't catch that very well.
If you have a server outside of your network you could simply hold open a TCP connection and report when it breaks, but I'll admit at that point it's outside of what I've had to deal with.
You might be looking for a "smoke test"?
How micro are the cuts? You might get pretty far with the "ping" tool without any fancy monitoring setup around it.
I'll take a look at smoke test. Thanks. The cuts are only a few seconds long.
Might also be a DNS server issue. Try configuring a non-ISP one in your network settings and see if that helps.
Not sure, but I think that designing an internet measurement in a RIPE Atlas network might just fit this task? https://atlas.ripe.net/probes/ You have micro cuts, but are those only to the big name websites or to something local as well? Might help answer that, and give ISP data on where exactly are they hitting the bottleneck and what are they missing monitoring.
For a while now I've had Grafana hooked up to InfluxDB and Telegraf. Using Telegraf I setup pings to ips along my route to the larger internet, major dns providers, and several large internet sites. I measure response time and packet loss. It has allowed me to cut through the Comcast BS when diagnosing problems with them. I can tell them for sure that the problem is inside their network and is the X hop from my router.
I recently started setting up Grafana over on a different server and I'm using Prometheus instead to monitor more than just the other server I was monitoring. I haven't yet set it up with that but it looks like something similar is possible with Prometheus based on the small amount of research I've done on it.