Lemmy Project Priorities Observations

5 readers

1 users here now

I've raised my voice loudly on meta communities, github, and created new [email protected] and [email protected] communities.

I feel like the performance problems are being ignored for over 30 days when there are a half-dozen solutions that could be coded in 5 to 10 hours of labor by one person.

I've been developing client/server messaging apps professionally since 1984, and I firmly believe that Lemmy is currently suffering from a lack of testing by the developers and lack of concern for data loss. A basic e-mail MTA in 1993 would send a "did not deliver" message back to message sender, but Lemmy just drops delivery and there is no mention of this in the release notes//introduction on GitHub. I also find that the Lemmy developers do not like to "eat their own dog food" and actually use Lemmy's communities to discuss the ongoing development and priorities of Lemmy coding. They are not testing the code and sampling the data very much, and I am posting here, using Lemmy code, as part of my personal testing! I spent over 100 hours in June 2023 testing Lemmy technical problems, especially with performance and lost data delivery.

I'll toss it into this echo chamber.

founded 1 year ago

MODERATORS

[email protected]

Day 67, Lemmy.world's weekend outage, concurrency throttle (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

5 comments fedilink hide all child comments

Lemmy.world had to shut down the front page and put up a message about the load and a graph. They seem to chalk it down to the nature of social media sites to attract attacks.

I'd hack up the Rust code to have self-awareness of concurrency with PostgreSQL and return a new busy error.

Federation connections, RSS feed, API - and any other method that is hitting the database needs to have a concurrency count in the Rust code and an error message system for busy.

I'd probably build a a class to help with this and once concurrency for an API is over 5 mark the high water with a timestamp and start doing logic based on elapsed time. If > 5 and elapsed time exceeds a threshold (say 1 minute), then return the busy error.

is Prometheus the right way to expose these numbers for operators wanting to know about the thresholds.? I'd probably add a dedicated log file to track concurrency thresholds and busy errors.

the front-end apps also need to be caching "Trending communities", I think lemmy-ui is still pulling that live from PostgreSQL for every refresh of the page. I need to check if anyone has added that.

top 5 comments

sorted by: hot top controversial new old

[–] [email protected] 1 points 1 year ago

so, some work to do:

rework the testing scripts so that they don't actually delete data each run.
can I use bash script to get pg_stat_statements between individual tests

[–] [email protected] 1 points 1 year ago

Maybe I'm overthink the performance problems.

Deleting accounts probably creates a swarm of activity like I opened a GitHUb issue, and it's already been a source of problem triggering bugs. But even without bugs, it's stil got to make the system run way slower. And there is nothing preventing someone from setting up a federation instance, creating a bunch of content, then deleting it - triggering multiple servers to overload.

The variability of performance on reads could be directly tied to how much writes are gong on with account deletion.

Even comment reply chains seem to be triggering (replaceable) performance concerns.

[–] [email protected] 1 points 1 year ago

Sometimes you wish you could have the API log everything and be able to play back API activity on the test data. Maybe I'll play around with such a feature.

‘Lemmy.world has been down between 02:00 UTC and 05:45 UTC. This was caused by the database spiking to 100% cpu (all 32 cores/64 threads!) due to inefficient queries been fired to the db very often.’

[–] [email protected] 1 points 1 year ago

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

About the lemmy.world outage: https://lemmy.ml/post/2630602

I'm curious if any insert activity was involved, logged-in accounts or federated incoming.

Did they turn off their API, federation?