this post was submitted on 24 Aug 2023
1 points (100.0% liked)

Lemmy Project Priorities Observations

5 readers
1 users here now

I've raised my voice loudly on meta communities, github, and created new [email protected] and [email protected] communities.

I feel like the performance problems are being ignored for over 30 days when there are a half-dozen solutions that could be coded in 5 to 10 hours of labor by one person.

I've been developing client/server messaging apps professionally since 1984, and I firmly believe that Lemmy is currently suffering from a lack of testing by the developers and lack of concern for data loss. A basic e-mail MTA in 1993 would send a "did not deliver" message back to message sender, but Lemmy just drops delivery and there is no mention of this in the release notes//introduction on GitHub. I also find that the Lemmy developers do not like to "eat their own dog food" and actually use Lemmy's communities to discuss the ongoing development and priorities of Lemmy coding. They are not testing the code and sampling the data very much, and I am posting here, using Lemmy code, as part of my personal testing! I spent over 100 hours in June 2023 testing Lemmy technical problems, especially with performance and lost data delivery.

I'll toss it into this echo chamber.

founded 1 year ago
MODERATORS
 

Lemmy instance Beehaw staff on Monday, August 21 2023....
https://beehaw.org/comment/1018508

"From where I’m standing, I can’t really much has changed unfortunately… which really sucks…

Lemmy.world has grown substantially meanwhile the moderation tools have not improved at all. All I can say about the moderation tools is that we now know that the tools suck more than they used to.

Here’s a list of moderation problems that we have discovered since then:

  • If a Berson is reported on another instance, we never get the report.
  • If a mod is banned from the community they mod, they can still take mod actions
  • If you get site-banned from Beehaw while you are from another instance, you can still post on the community and people from that instance and kbin can see your posts
  • People from other instances can’t know who if someone is an admin on the instance they’re interacting with
  • People from other instances can’t see when we use the shield function to signal we’re talking “officially / as a mod”
  • The modlog is not chronological
  • The modlog breaks if you ban someone for more than 4 digit days.

A banned user’s description is still visible so if they link to a scat image in their description, it is still visible to moderators. Despite these newly known problems, there have been exactly no improvement whatsoever to the moderation tools. It is honestly unsettling and terrifying."

Context: Lemmyy has been on GitHub and in production at Lemmy.ml for over 4 years for the purposes of running and moderating a message forum / link aggregator. Beehaw has been online for over a year before the May 2023 Reddit influx.

all 8 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 1 year ago

NOTE: These comments are a work in progress...

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (3 children)

Lemmy had been on GitHub for over 4 years when May 2023 rumblings out of Reddit began that by June 1 there was a countdown of 30 days...

What lemmy lacked was data-centered testing. First off, the quantity of data. There was no means in the project to populate even what might be considered a modest amount of data: 500 communities, 10 thousand posts, 20 thousand comments, 5000 users. The testing servers that were in place in June 2023 (enterprise, voyager, etc) were empty. They had some basic scripts to test posts and comments, but feature-focused testing, not tens of thousands.

Lemmy was only tested on nearly empty data, that is how the testing had been done for over 4 years.

With ORM generated SQL statements and a lack of PostgreSQL understanding, there were big problems in the code in May 2023. The site_aggregates table had a SQL TRIGGER UPDATE statement with no WHERE clause that was hitting on every known Instance in the database. Testing only created 5 instances, so this performance problem was not noticed. Lemmy-ui for admin/operator managing the site had no page to view this site_aggregates database table.

Scaling and performance issues were already crashing lemmy.ml in May. They opted to close the doors for new member sign-up and did massive hardware upgrades on June 13. But the performance problems and crashes did not stop, it only worsened with each new instance gong online with the site_aggregates write-activity bug going undiscovered. To try and solve crashing, more and more instances were added to the Lemmy network, exploding from 60 to 600 servers, even over 1500 at one point - almost all because of the Reddit July 1 API cutoff. So you had on June 20 lemmy.ml hammering hundreds of instance rows in the site_aggregates database table with SQL UPDATE write activity on every single new comment and post...

[–] [email protected] 1 points 1 year ago

Lemmy had been on GitHub for over 4 years

And in use... lemmy.ml was online for over 4 years before the Reddit May 2023 event, not just Github:

"Lemmy - A link aggregator / reddit clone for the fediverse. github.com" @Joe to Rust Programming • April, 2019. https://lemmy.ml/post/17

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

Given the struggle with PostgreSQL that the project has had for 4 years, and the two main developers both admit they are not good with SQL relational database... the classic answer would be to use what is called "NoSQL database" solutions. https://en.wikipedia.org/wiki/NoSQL

Interesting to the whole May 2023 situation... Reddit itself was open source code in 2008 and used PostgreSQL database that Lemmy uses! Reddit creators gave a presentation in May 2010 about how to scale a link aggregator website like Lemmy and they advised to basically use NoSQL techniques with PostgreSQL. https://www.infoq.com/news/2010/05/7-Lessons-Reddit/

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

May and June 2023... "add instances for scaling"

cross-instance community shock built in

More and more servers were added to the Lemmy network, but the issues that Beehaw pointed out were lurking too. The experience of a new Lemmy instance server is:

Despite the wide variety of hardware and budget choices, there is no install testing to establish that PostgreSQL has enough RAM and disk I/O performance to support Lemmy. Lemmy starts with zero data in the tables, Lemmy starts out empty... where it performs fine. The server can federate and subscribe one community at a time to remote servers on the Lemmy network

Social Experience of a New Instance

Each remote community starts out with with a handfull of posts, but none of the comments, so it is the community in name - but not content. The number of posts, comments, active users of a community - the community_aggregates database table - is not copied from Beehaw. So a new instance will only show a few users, a few posts, and no comments. The data misrepresents Beehaw's community to new members of this remote instance.

This creates a disconnected experience from the lemmy instance, the entire history of comments and posts are not there, so context of what is a repost or duplicate post isn't available to people joining the community on a newly created instance.

Beehaw had been online with content since January 2022, for well over a year before the rumblings of Reddit influx started in May 2023.

So new Lemmy servers that were fired up in June 2023 - by the hundreds - all started with empty copies of Beehaw communities, void of the moderated posts and comments to set an example of what community content looked like. And members of those new servers would just dive in to what they perceived to be an empty community - a place of other Reddit newcomers - just now generating new posts and comments.

The framing and name of remote instance can present the person with a variety of context clues and expectations, when they switch topics to a Beehaw community - it has all be framed by a non-Beehaw Sign Up screening, introduction, and site name. A newcomer can jump right into a conversation topic without having a social sense of what that means...

It was negative experiences with Reddit that were driving the activity, not the seeking of quality content, but a money matter and price increase on the Reddit side. This too set a tone of grievance and desire to find rapid replacement for the feed that people had been accustomed to. There were even bots created to feed Reddit content into Lemmy that went online...

For an established site, Beehaw, with over a year's worth of cultivated content and experience with Lemmy, this was a shock. And the server crashes and overloads by just the site_aggregates UPDATE bug because of new instance rows exploding in the database table - was worsening day by day. That particular bug didn't hit on the new instance servers very hard, they had very few members and were not creating posts and content, but the active server Beehaw with active local users - was carrying all the load of that bug plus having overloads trying to send out federated copies of new content to hundreds of once-eager servers that might be shut down and not even bother to read the content.

Beehaw wasn't the only server active with local users before the mass swarm of over a thousand new instance servers went online...

[–] [email protected] 0 points 1 year ago (1 children)

Lemmy instance Beehaw staff on Monday, August 21 2023....
https://beehaw.org/comment/1018508

"From where I’m standing, I can’t really much has changed unfortunately… which really sucks…

Lemmy.world has grown substantially meanwhile the moderation tools have not improved at all. All I can say about the moderation tools is that we now know that the tools suck more than they used to.

Here’s a list of moderation problems that we have discovered since then:

If a Berson is reported on another instance, we never get the report.
If a mod is banned from the community they mod, they can still take mod actions
If you get site-banned from Beehaw while you are from another instance, you can still post on the community and people from that instance and kbin can see your posts
People from other instances can’t know who if someone is an admin on the instance they’re interacting with
People from other instances can’t see when we use the shield function to signal we’re talking “officially / as a mod”
The modlog is not chronological
The modlog breaks if you ban someone for more than 4 digit days.

A banned user’s description is still visible so if they link to a scat image in their description, it is still visible to moderators. Despite these newly known problems, there have been exactly no improvement whatsoever to the moderation tools. It is honestly unsettling and terrifying."

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

I'll point this out: the people who develop Lemmy, the two programmers for 4 years, do not in 2023 seem active on Lemmy to discuss moderation problems or PosgreSQL problems or anything about how to run a Lemmy server or the errors and problems the site is reporting.. I begged them to use [email protected] right on their own server in early June 2023. I created [email protected] community right on the developer home server - and I created this post on June 7, 2023 in particular to raise alarms about the countdown to June 30 Redit API change and how badly Lemmy code was performing and the data loss from crashes: https://lemmy.ml/post/1166882

I don't think they even bother to create a login on any other Lemmy server and see if the code works in production. I shared issues of missing data form federation not properly sending to remote servers - and they got annoyed. I see no evidence they actually care about data integrity and database crashes in production causing real comments, posts to get lost. The 4 year old code doesn't have admin screens to show errors in saving posts to the database, errors with federation delivery failure, etc. For a 4 year old ode, they just let errors happen and sites like Beehaw find out things aren't working "the hard way". There was no tool to know if federation was working or failing, the queue was backing up like an overflowing toilet, or even a warning about queue lost data if you stopped a server (even for an upgrade).

Data and databases - and using Lemmy itself to test and discuss Lemmy data issues - seems to be avoided and annoying to them.