this post was submitted on 15 Jun 2023
189 points (93.5% liked)

Programming

17808 readers
226 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 2 years ago
MODERATORS
 

My first experience with Lemmy was thinking that the UI was beautiful, and lemmy.ml (the first instance I looked at) was asking people not to join because they already had 1500 users and were struggling to scale.

1500 users just doesn't seem like much, it seems like the type of load you could handle with a Raspberry Pi in a dusty corner.

Are the Lemmy servers struggling to scale because of the federation process / protocols?

Maybe I underestimate how much compute goes into hosting user generated content? Users generate very little text, but uploading pictures takes more space. Users are generating millions of bytes of content and it's overloading computers that can handle billions of bytes with ease, what happened? Am I missing something here?

Or maybe the code is just inefficient?

Which brings me to the title's question: Does Lemmy benefit from using Rust? None of the problems I can imagine are related to code execution speed.

If the federation process and protocols are inefficient, then everything is being built on sand. Popular protocols are hard to change. How often does the HTTP protocol change? Never. The language used for the code doesn't matter in this case.

If the code is just inefficient, well, inefficient Rust is probably slower than efficient Python or JavaScript. Could the complexity of Rust have pushed the devs towards a simpler but less efficient solution that ends up being slower than garbage collected languages? I'm sure this has happened before, but I don't know anything about the Lemmy code.

Or, again, maybe I'm just underestimating the amount of compute required to support 1500 users sharing a little bit of text and a few images?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 20 points 2 years ago (3 children)

I maintain and host ntfy.sh, an open source push notification service. I have a constant 9-12k WebSocket and HTTP stream connections going, and I host it on a two core machine with an average load average of less than 1. So I can happily tell you that it's not WebSockets. Hehe.

My money would be on the federation. Having to blast/copy every single comment to every single connected instance seems like a lot.

[–] sznio 3 points 2 years ago (2 children)

My money would be on the federation. Having to blast/copy every single comment to every single connected instance seems like a lot.

As far as I know, every server connects to every other server. Allowing for proxying messages through servers would significantly help.

[–] [email protected] 4 points 2 years ago (1 children)

I agree.

Random ideas:

The Kademlia protocol (a DHT) has a thing that associates ownership of data to the 20 closest nodes in a P2P network. If an approach like this were used, the load would be spread across those 20 nodes. I implemented that like 15 years ago or so. It was a ton of fun.

Another, simpler approach is what you suggested, simple caching of and relaying through other nodes, though that does not answer the topology of the network. How would an instance decide where to get it's data from (a star, a tree, at random, ...)? How would it be authenticated (easy to solve)? Lots of fun problems to solve. Not fun problems though if you have a pile of other problems too though...

[–] sznio 3 points 2 years ago* (last edited 2 years ago)

How would an instance decide where to get it’s data from (a star, a tree, at random, …)?

I thought of it like this:

  • Each instance can optionally work as a relay for other instances - this relation is called "friendship".
  • Each instance defines a friend list on their own.
  • Whenever an instance is a friend of an another instance, it publishes that information for everyone to see.
  • When an instance receives information from a friend, it sends it to it's own friends.
  • When an instance sends information, it:
    • Creates a "send queue" that contains all the instances it wants to keep informed of it's own activity.
    • Shuffles the order of the queue.
    • Iterates over instances in that queue
    • Sends information to that instance
    • Checks if that instance is it's friend.
    • Checks if it itself is a friend of that instance.
    • If that's true, considers that instances friends as already informed - thus removing them from the send queue.
    • Else, just proceeds normally.

If an instance misbehaves by not relaying messages despite claiming to be doing so - unfriend it.

How would it be authenticated

Each instance publishes a public key that you can use to verify relayed messages.

I probably should get on to helping out developing Lemmy - it feels like there's RFC's to be written and interesting problems to be solved. Much more interesting than what I'm doing at work.