Does your server have enough power and workers to handle all the federated messages? Or is it constantly at 100% CPU?
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
The machine is a dedicated server with 6 cores, 12 threads, all of which are usually under 10% utilization. Load averages currently are 0.35 0.5 0.6. Maybe I need to add more workers? There should be plenty of raw power to handle it.
Yeah that sounds about enough to handle the load. How many workers do you use? And do you see any errors in your logs about handling messages? You could try to search for that particular thread to see if all replies are handled correctly?
Update: Did a -f
watch of the logs for WARN messages while upping worker counts. Seems 1024 was the sweet spot. Upped further to 1500, and the warnings for expired headers have entirely stopped in large part. So it seems this was the solution.
Thanks for your help!
I had it at the default 64, but I'll try 512 and see if that helps. Nginx is configured to use 768, so I doubt there's any bottleneck there. I did notice in the troubleshooting page there's mention of searching the logs for "Activity queue stats," but a grep of the docker log shows no results for that string.
I haven’t noticed it happening. But haven’t checked much.
What I have noticed is that some of the overloaded and larger instances can be slow…to post comments….to subscribe to…to post threads on etc. especially from a separate federated instance.
Lemmy.world is easily one I have noticed along with lemmy.ml and occasionally…beehaw (but much less so).
My guess is that in general those instances may be slow to sync/update data or respond.
I'm also seeing this issue on the two instances you've mentioned. I'm not sure if it is just an overloaded issue, or if there's more fundamental issue with the way I'm setting things up. One way around it is if I see a comment I really want to interact out of my own instance, I can copy the link from the fediverse icon, and then search for it. Then the comment (along with its parents) will pop up on my instance eventually. Not idea, as I'd still have to venture out of my own instance to discover the said comment chain, but at least it provides a way to interact, for now.
I would just give it time. I think those instances have some scaling issues and things take time to sync.
Do you have other users on your instance?
I noticed it took a day or two to “catch up” as I added and federated with new communities on these instances.
Again, I haven’t really dug in. They have seemed okay (I do have accounts on those instances too). It seems once everything is “caught up” and it’s just incremental it goes smoother.
But are you seeing any resource constraints on your instance? Like cpu or ram?
All by myself. Plenty of room for activity. We'll see if it catches up or just end up creating a larger divergence! And yeah, I do have account on lemmy.world as well, so it's just extra song and dance for now.
Oh I see your account is only 14 hours old. Yeah I would give it another 24-36 hours to do pulls and look then.
Everytime I add a community I start with all the links and all have 0 comments. Then after a while they sync up. I used fediverse.net to just start pulling all sorts of communities. But at this point it seems okay. My instance has been up for a few days now.
New instances are popping up all over so those bigger ones have a lot of servers syncing with them.
Yeah, there are ways around the de-sync (albeit super manual) for now, so I’m just waiting and seeing for the time being :)
Head to https://lemmyverse.net and click the Home button at the top right and type in the URL of your instance.
Flip from instance to community at the top.. Then you can click on a community name and open it in a new tab
Heres one as an example
If you get a 404 like this you havent loaded/synced it yet.
So you need to go to your instance search and copy the !link below it into search Like so.
Mash the search button a few times and the community will show up. Some of the isntances are overloaded and slow as hell though. So be patient. Sometimes I have to change the search filter from all to community and back.
Now the community will show up in your instance and slowly start syncing. Note the 0 Comments on everything. The syncing is slow right now.
Also for kbin instances (ie: kbin.social and fedia.io), The ! format doesnt seem to work.
For those I have to just search the full URL. Like so.
I think you mean how to pull communities from other instances? Just search for the link or descriptor of the community; e.g. [email protected] and it will be pulled. Takes some time for the result to show up though, I usually have to search again after some minutes to actually find it.
I’m not sure what you mean by pull instances. I’m really brand spanking new at this. Sorry!
So looking more. Yeah it may be more active threads are behind. Here’s an example that’s out of sync on my instance right now
This arises from the good ol issue of everybody just migrating to the same three or four big servers which end overloaded with their own users and can't send updates to other instances.
I remember the same happening to Mastodon during the first few exodus until a combination of people not staying, stronger servers and software improvements settled the issue.
I can barely get updates from lemmy.ml and lemmy.world isn't much better
Beehaw seems to perform okey.
About half of the communities on lemmy.ml I subscribed tomare on "Subscribe Pending" and have been since I started this server.
I've noticed something similar on my instance in some cases as well. Nothing obvious logged as errors either. It just seems like the comment was never sent. In my case cpu is minimal so it doesn't seem like a resource issue on the receiving side.
I suspect it may be a resource issue on the sending side. Potentially, not able to keep up with the number of subscribers. I know there was some discussion from the devs around the number of federation workers needing to be increased to keep up, so another possibility there.
It's definitely problematic though. I was contemplating implementing some kind of resync this entire post and all comments via the Lemmy API to get things back in sync. But, if it is a sending server resource issue, I'm also hesitant to add a bunch more API calls to the mix. I think some kind of resync functionality will be necessary in the end.
I seriously thought I'm alone with this issue, but it seems it's fairly common for people hosting on their own. Same as you guys, it won't sync everything, some communities are even "stuck" with posts from a day back, even though there were many new ones posted.
Kind of off topic question, but I guess it's related? Is there anyone that can't pull a certain community from an instance? I seem to can't pull [email protected] or anything from that community, that includes posts and comments. No matter how many times I try, it won't populate on my instance.
EDIT: Caught this in my logs:
lemmy | 2023-06-20T08:48:21.353798Z ERROR HTTP request{http.method=GET http.scheme="https" http.host=versalife.duckdns.org http.target=/api/v3/ws otel.kind="server" request_id=cf48b226-cba2-434a-8011-12388c351a7c http.status_code=101 otel.status_code="OK"}: lemmy_server::api_routes_websocket: couldnt_find_object: Failed to resolve actor for [email protected]
EDIT2: Apparently it's a known issue with [email protected], and a bug to be fixed in a future release.
I have the exact same issue with my own instance. On the post you mentioned, I'm seeing 383 comments on lemmy.world, but my own instance only shows 128 comments.
I've noticed the same situation in some threads on my own instance too. But I'm under the impression that it might just be backlogged on the responsible instance that's supposed to send out the federated content. I've noticed this when just having my home feed set to New and then suddenly seeing like thirty posts from lemmy.world come across all at once with widely varied timestamps.
I suppose the best way to test if this is the case would be to note down any threads that are missing substantial amounts of comments on your local server and then check back with that thread periodically to see if and when they start to fill in.
Even this post is doing it to me. On your instance this post has 12 comments, on my instance it has 4 comments.
I have the same issue and I also get the warning for the expired headers. I have tried increasing the federation.worker_count(to 99999) and nginx workers(to 10000), but the issue still occurs for me.
There is also a lot of missing comments for me on my own instance. Have to view this post on lemmy.world to get all the comments, since there are a lot missing on my own instance.
I've noticed some similar issues on my instance, but I'm wondering if it's related to how much strain is currently on the bigger instances like lemmy.world or beehaw.
From what I read in the troubleshooting guide, if their worker count isn't high enough, the issue can start on their end, too.
Maybe one day the servers will implement a call to backtrack for missing content. Because I could see federation failures like this being a big missing point for wide adoption.
It's possible to do via the API as is if you were to connect to the first instance, then call resolveobject enough times on your home instance if there's a discrepancy. But that would require an individual API call for every missing object, and it would be painful for big instances.
Watching. I’m noticing the same thing on my instance. In fact I don’t see any comments or upvotes coming through. Just the initial post.
Viewing this post on my own instance as well as a few other non-lemmy.world instances shows only 18/19 comments. When I look at it on lemmy.world I see that it has 31. It goes up to 32 when you include this comment so it looks like it probably isn’t a problem on your end.