this post was submitted on 17 Aug 2023
1 points (100.0% liked)

Lemmy Project Priorities Observations

5 readers
1 users here now

I've raised my voice loudly on meta communities, github, and created new [email protected] and [email protected] communities.

I feel like the performance problems are being ignored for over 30 days when there are a half-dozen solutions that could be coded in 5 to 10 hours of labor by one person.

I've been developing client/server messaging apps professionally since 1984, and I firmly believe that Lemmy is currently suffering from a lack of testing by the developers and lack of concern for data loss. A basic e-mail MTA in 1993 would send a "did not deliver" message back to message sender, but Lemmy just drops delivery and there is no mention of this in the release notes//introduction on GitHub. I also find that the Lemmy developers do not like to "eat their own dog food" and actually use Lemmy's communities to discuss the ongoing development and priorities of Lemmy coding. They are not testing the code and sampling the data very much, and I am posting here, using Lemmy code, as part of my personal testing! I spent over 100 hours in June 2023 testing Lemmy technical problems, especially with performance and lost data delivery.

I'll toss it into this echo chamber.

founded 1 year ago
MODERATORS
top 5 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 1 year ago

The pattern is this...

  1. 2FA is added to code, not well tested, and bugs go on for a long period with no concern to 'nip them in the bud' while code experience is fresh

  1. The same thing happened with munging of "&" in title of post and code blocks of comments.
[–] [email protected] 1 points 1 year ago
[–] [email protected] 1 points 1 year ago (1 children)

I think we should look at the whole situation as a blessing that not-logged-in/anonymous users consistently performs perfectly fine. I've tried over 10 million posts, and it scales well. It works for a single community, it works for All.

If I were integrating API response caching, I would focus on that same aspect. Listing /c/memes by Active for up to 20 pages of 50 length, 1000 posts, would be a cache-worthy dataset. Then offload filtering to an independent DBMS, in-memory application code, or even the client-side code.

The tricky Lemmy usage case tends to be: 1) listing a profile beaks all rules about community data and even time-based organization. But it is the same output, basically static. For those reasons, it is a candidate to cache slow-path on rebuild, restrict during high concurrency/server nearing overload. 2) listing multiple communities between All and Nothing. Blending communities, even from remote API responses, is always going to be tricky for cache attempts. 3) situations where a lot of blocking of instance or community gets a particular user into basically a custom sort order and post listing.

[–] [email protected] 1 points 1 year ago

Basic here...

  1. we could archive out old data

  2. we could create parallel tables or materialized views and maintain a table of new data.

Lemmy really pushes the idea of data being presented backwards, forwards, filtered, sliced, diced, every way. It favors having a smaller set of data.

If you are going to have all this dynamic merging of communities, cherry-picking which communities, person to person blocking, flags like NSFW filtering thrown in at any time.... you need a concept of data walls on how much data. Especially with the ORM overhead at play.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

The index usage of post is all wrong, wrong thinking...

CREATE INDEX idx_post_aggregates_featured_community_active ON public.post_aggregates USING btree (featured_community DESC, hot_rank_active DESC, published DESC);

CREATE INDEX idx_post_aggregates_featured_community_controversy ON public.post_aggregates USING btree (featured_community DESC, controversy_rank DESC);

SELECT ends with...
ORDER BY "post_aggregates"."featured_community" DESC , "post_aggregates"."hot_rank_active" DESC , "post_aggregates"."published" DESC

they created compound indexes based on how ORDER BY is used. But there is a maximum page length of 50 posts. So really that is only helping with sorting 50 posts each SQL SELECT.

INDEX needs to be the heart of WHERE clause. Not ORDER BY hot_rank_active, and LIMIT 50. The hot_rank_active could have been put into the WHERE clause, but it wasn't.