this post was submitted on 29 Jun 2023
335 points (98.8% liked)

Lemmy.World Announcements

29167 readers
29 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news ๐Ÿ˜

Outages ๐Ÿ”ฅ

https://status.lemmy.world

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

founded 2 years ago
MODERATORS
 

So I've been troubleshooting the federation issues with some other admins:

(Thanks for the help)

So what we see is that when there are many federation workers running at the same time, they get too slow, causing them to timeout and fail.

I had federation workers set to 200000. I've now lowered that to 8192, and set the activitypub logging to debugging to get queue stats. RUST_LOG="warn,lemmy_server=warn,lemmy_api=warn,lemmy_api_common=warn,lemmy_api_crud=warn,lemmy_apub=warn,lemmy_db_schema=warn,lemmy_db_views=warn,lemmy_db_views_actor=warn,lemmy_db_views_moderator=warn,lemmy_routes=warn,lemmy_utils=warn,lemmy_websocket=warn,activitypub_federation=debug"

Also, I saw that there were many workers retrying to servers that are unreachable. So, I've blocked some of these servers:

commallama.social,mayheminc.win,lemmy.name,lm.runnerd.net,frostbyrne.io,be-lemmy.org,lemmonade.marbledfennec.net,lemmy.sarcasticdeveloper.com,lemmy.kosapps.com,pawb.social,kbin.wageoffsite.com,lemmy.iswhereits.at,lemmy.easfrq.live,lemmy.friheter.com,lmy.rndmm.us,kbin.korgen.xyz

This gave good results, way less active workers, so less timeouts. (I see that above 3000 active workers, timeouts start).

(If you own one of these servers, let me know once it's back up, so I can un-block it)

Now it's after midnight so I'm going to bed. Surely more troubleshooting will follow tomorrow and in the weekend.

Please let me know if you see improvements, or have many issues still.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 11 points 2 years ago* (last edited 2 years ago) (1 children)

It's only been a few minutes but I'm seeing non timing out federation in my nginx access log. Hopefully it keeps working.
Also at least on my instance, lemmy.ml has completely broken, I'm not getting anything from it at all anymore. it dropped out at 13:52:22 and besides a couple few messages it's been silence since then. It seems to be working on lemmy.world so I'm not sure what's causing that.

[โ€“] [email protected] 2 points 2 years ago* (last edited 2 years ago) (1 children)

How are you monitoring this? Was it just a good peek at nginx log or something else, greylog?

[โ€“] [email protected] 2 points 2 years ago* (last edited 2 years ago)

I just grep'd the nginx access log for the lemmy.world IP address and looked at the access times. you can see if it's timing out is the response code is 400. sadly ~57% of the requests are currently timing out today. it seems to work for a bit then time out for about 10 minutes, at least there is some coming through now, before the requests had stopped completely.

lemmy.ml is also back in my logs. yay
lemmy.ml is almost perfect on the timeouts, so they must have managed to fix it.