this post was submitted on 15 Jan 2025
92 points (97.9% liked)

Selfhosted

41227 readers
765 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Today, lemmy.amxl.com suffered an outage because the rootful Lemmy podman container crashed out, and wouldn't restart.

Fixing it turned out to be more complicated than I expected, so I'm documenting the steps here in case anyone else has a similar issue with a podman container.

I tried restarting it, but got an unexpected error the internal IP address (which I hand assign to containers) was already in use, despite the fact it wasn't running.

I create my Lemmy services with podman-compose, so I deleted the Lemmy services with podman-compose down, and then re-created them with podman-compose up - that usually fixes things when they are really broken. But this time, I got a message like:

level=error msg=""IPAM error: requested ip address 172.19.10.11 is already allocated to container ID 36e1a622f261862d592b7ceb05db776051003a4422d6502ea483f275b5c390f2""

The only problem is that the referenced container actually didn't exist at all in the output of podman ps -a - in other words, podman thought the IP address was in use by a container that it didn't know anything about! The IP address has effectively been 'leaked'.

After digging into the internals, and a few false starts trying to track down where the leaked info was kept, I found it was kept in a BoltDB file at /run/containers/networks/ipam.db - that's apparently the 'IP allocation' database. Now, the good thing about /run is it is wiped on system restart - although I didn't really want to restart all my containers just to fix Lemmy.

BoltDB doesn't come with a lot of tools, but you can install a TUI editor like this: go install github.com/br0xen/boltbrowser@latest.

I made a backup of /run/containers/networks/ipam.db just in case I screwed it up.

Then I ran sudo ~/go/bin/boltbrowser /run/containers/networks/ipam.db to open the DB (this will lock the DB and stop any containers starting or otherwise changing IP statuses until you exit).

I found the networks that were impacted, and expanded the bucket (BoltDB has a hierarchy of buckets, and eventually you get key/value pairs) for those networks, and then for the CIDR ranges the leaked IP was in. In that list, I found a record with a value equal to the container that didn't actually exist. I used D to tell boltbrowser to delete that key/value pair. I also cleaned up under ids - where this time the key was the container ID that no longer existed - and repeated for both networks my container was in.

I then exited out of boltbrowser with q.

After that, I brought my Lemmy containers back up with podman-compose up -d - and everything then worked cleanly.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 35 points 6 days ago* (last edited 6 days ago) (17 children)

Good debugging!

I'm thinking that it's best for production to use dynamic IP addresses, to avoid this kind of conflict. In the kubernetes space, all containers must have dynamic IP addresses, which are then tracked by a ebpf load balancer with a (somewhat) static IP.

[–] deafboy 2 points 5 days ago (1 children)

Every time I feel like I finally get kubernetes, somebody surprises me by talking about very specific, complex modern technologies being used to do basically the same thing we've been doing for decades by simple tools.

And I always experience the same urge to re-grow a tail and climb a tree.

[–] [email protected] 1 points 4 days ago* (last edited 4 days ago)

Oh definitely, everything in kubernetes can be explained (and implemented) with decades-old technology.

The reason why Kubernetes is so special is that it automates it all in a very standardized way. All the vendors come together and support a single API for management which is very easy to write automation for.

There's standard, well-documented "wizards" for creating databases, load-balancers, firewalls, WAFs, reverse proxies, etc. And the management for your containers is extremely robust and extensive with features like automated replicas, health checks, self-healing, 10 different kinds of storage drivers, cpu/memory/disk/gpu allocation, and declarative mountable config files. All of that on top of an extremely secure and standardized API.

With regard for eBPF being used for load-balancers, the company who writes that software, Isovalent, is one of the main maintainers of eBPF in the kernel. A lot of it was written just to support their Kubernetes Cilium CNI. It's used, mainly, so that you can have systems with hundreds or thousands of containers on a single node, each with their own IP address and firewall, etc. IPtables was used for this before. But it started hitting a performance bottleneck for many systems. Everything is automated for you and open-source, so all the ops engineers benefit from the development work of the Isovalent team.

It definitely moves fast, though. I go to kubecon every year, and every year there's a whole new set of technologies that were written in the last year lol

load more comments (15 replies)