this post was submitted on 12 Jun 2023
140 points (91.2% liked)

Selfhosted

40943 readers
1133 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
140
submitted 2 years ago* (last edited 2 years ago) by Averrin to c/selfhosted
 

Correct me if I'm wrong. I read ActivityPub standards and dug a little into lemmy sources to understand how federation works. And I'm a bit disappointed. Every server just has a cache and the ability to fetch something from another known server. So if you start your own instance, there is no profit for the whole network until you have a significant piece of auditory (e.g. private instances or servers with no users). Are there any "balancers" to utilize these empty instances? Should we promote (or create in the first place) a way how to passively help lemmy with such fast growth?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 20 points 2 years ago (2 children)

it could have been done much better.

Care to expand on this point?

[–] [email protected] 5 points 2 years ago (2 children)

Disclaimer: I've only looked a bit at the protocols and high levels descriptions of how it works, and this is just my understanding of it. But it seems to track.

let's take .. [email protected] for example. Right now lemmy.world is the Source of Truth on this, which means if you sign up for it on a different host, let's say myawersomeinstance.com, that first contacts lemmy.world, copies over posts, and then subscribes on new posts for that. Actually not 100% sure if lemmy.world contacts myawersomeinstance.com when there's a new post, or myawersomeinstance.com polls lemmy.world.. But anyway, point is, lemmy.world is authority on it. myawersomeinstance.com also have [email protected] data, but it's a copy of it. And lemmy.world is only authority. So if you post something, your server then sends it to lemmy.world and waits a reply. Then lemmy.world contacts all instances that has at least one user following this to tell about the new post. And that new post now exists on a few hundred databases.

The problem is the scaling is whack. Okay, you can have 5000 federated servers with users subscribing to [email protected], but that means lemmy.world needs to update 5000 servers per post, and there'll be 5000x storage used for that post, and ALL 5000 servers contacts lemmy.world to get the new good stuff.

Frankly, it's a scaling nightmare. As for a different approach, you could have private / public keys and sign updates from lemmy.world and allow the other instances to fetch the new data from each other. That would also allow more relaxed caching, since it would be generally lower cost to re-fetch the data. Now you need aggressive caching because you don't want lemmy.world to keel over and die form every server on the planet wanting to hear the latest and greatest posts all the time.

[–] [email protected] 3 points 2 years ago* (last edited 2 years ago)

Thanks for the in depth write up! I haven't looked too far into the docs or the subscription model, but is this a fault on Lemmy's end, or is this a function of how activity pub handles federated communication? (I'm very new to activity pub/federation, just now reading through the activity pub docs)

I do like your idea of distributed replication via keys,much better than what I had brainstormed

Edit: yeah it does look like it's a function of activity pub, wonder if theres a more scalable federation protocol out there

[–] [email protected] 2 points 2 years ago (1 children)

Could lemmy.world put a load balancer in front and use that to direct requests to different instances of lemmy.world? Not sure if that question is dumb I'm not a technical guy.

[–] [email protected] 3 points 2 years ago

It's not dumb at all, and it's a common scaling technique. But the software needs to support it, and I have no idea if lemmy has support for running multiple instances for one server.

[–] TinfoilBeanieTech 1 points 2 years ago

Seeing Lemmy groan under influx of new users, but still a much smaller number than established centralized apps made me start wondering how it would scale to a couple of orders magnitude larger numbers. I’ve only started diving into code and architecture, but I’m worried that as the number of instances grow they’ve got an N! connection problem going. This is not a simple problem to fix for a federated system, but it’s got to be addressed eventually.