this post was submitted on 12 Jun 2023
75 points (100.0% liked)
Lemmy
2172 readers
77 users here now
Everything about Lemmy; bugs, gripes, praises, and advocacy.
For discussion about the lemmy.ml instance, go to [email protected].
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Going from 512 to 160,000 is a massive parameter change.
Network replication like this presents a ton of issues with servers going up and down, database insert locking the tables, desire for backfill and integrity checks, etc.
Today things are going poorly, this posting has an example: https://lemmy.ml/post/1239920 -- comments are not showing up on other instances after hours.
From a denial-of-service perspective, intentional or accidental, I think we need to start discussing the protocol for federation. When servers connect to each others, how frequently, how they behave when getting errors from a remote host.
Is lemmy_server doing all the federation replication in-code? I would propose moving it to a independent service - perhaps a shell application - and start looking at replication (and associated queues) as a core thing to manage, optimize, and operate. It isn't working seamlessly and having hundreds of servers creates a huge amount of complexity.