this post was submitted on 11 Jul 2023
1086 points (98.6% liked)
Memes
46303 readers
2168 users here now
Rules:
- Be civil and nice.
- Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Where are they storing the session files then, in Memcached with 512 kB limit? No such issues on sh.itjust.works, so probably not a software issue
Could still be a software issue. Someone said this already but it could be possible that Lemmy.world is using a load balancer and multiple servers. These two servers' authentication tokens may be out of sync. So if you hit server 1 and you are sign in to server 1, you're good. If you hit server 2, you're signed out all of a sudden. This can also explain why the issue started to happen abruptly today. It's possible the load on the server wasn't that bad yesterday so the load balancer didn't kick in. This is all speculation. Will have to wait for an official message to confirm anything.
I set up infrastructure for web apps and what you are describing is still most likely server config issue, not Lemmy issue itself, unless Lemmy is lacking something to allow load balancing (then the bug is missing feature actually, also I don't think so). I don't know how Lemmy keeps/reads its sessions, but usually it doesn't matter from the application code standpoint. Preparing multi-host setup as an admin you need to take care about each instance accessing the same session data or whatever application data needs to be shared anyeay. There are many options:
The load balancing scenario where all requests are handled by one host and the other only takes requests when the other is overloaded, is very unlikely. The most common algorithms for balancing are roundrobin - which means (more-or-less) split connections (not load!) equally across all targets, and leastconn - which means hit the host that is least busy with active connections. I mean of course they could've used 'fallback' alhorithm, but it's rather inefficient in most scenarios.
Or maybe the issue is somewhere else, is caused by full-page/CDN cache etc.