this post was submitted on 08 Aug 2023
2 points (100.0% liked)

LemmyScale

27 readers
1 users here now

Discussions about approaches to help Lemmy scale better and reduce load on instances

See also

founded 2 years ago
MODERATORS
 

Right now, every single comment and post vote is federated and is a single row in PostgreSQL tables.

Each person and community has a home instance. They only accurate counts of total post, comments, and votes is on that home instance. And even then it is theoretically possible that to save disk space and/or improve performance, the home of a community could purge older data (or have loss of data).

For lemmy to lemmy, I think instead of federating the votes independent of the posts and comments, there could be sharing of aggregate data directly.

The model for this is how profiles of a person are federated. When a person revises their profile on their home instance, every other instance has to get the updated change. Such as a new image or revised bio. Same with the profile of a community is revised, such as changing image or sidebar of a community.

The code redesign could start out by making person and community aggregate count sharing part of those revisions. Those numbers are not that time-sensitive, the statistics of the number of users, posts, comments in a community could be behind by many hours without any real impact on the end-user experience of reading posts and comments on Lemmy.

With votes, posts it is more tricky. But some experiments could be done such as only sending post aggregates when a comment on that post is created, deleted, or edited.... and a more back-fill-oriented bulk operation take care of correcting and discovering out of sync information.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago

Performance isn't the only factor, http connection overhead to federate every single vote on post or comment... storage for them individually is higher than an aggregate per-instance.

It could be an option to have hybrid approach since all the code is there now for PostgreSQL row for every vote... such as after 14 days (or whatever) aggregate, move data out of PostgreSQL for archive/etc.