this post was submitted on 19 Aug 2024
18 points (100.0% liked)

Fedia Discussions

31 readers
52 users here now

founded 1 year ago
MODERATORS
 

Hi all. I've been having some problems keeping fedia.io running - at the moment, either the message workers or the php web server processes are dying after an hour or so and I have to restart everything. I have been working with the mbin team and installed some updates that we hoped would fix the problems, but no luck. I am going to work on a cron job to automatically restart things once an hour. The down side, is that you'll likely see some error 500's if you happen to hit it when the processes are restarting, but it should happen quickly and refreshing the page should make it work again.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 month ago (1 children)

I've noticed recently that I'm getting errors trying to vote on any posts in a discussion I've had open for more than maybe a minute (I haven't actually timed it). I don't remember it from before these issues, but I also switched to this instance just before. Might it be related?

[–] [email protected] 3 points 1 month ago (3 children)

It is possible. I will investigate and work on a fix

[–] [email protected] 2 points 1 month ago (1 children)

Thanks. Let me know if there's anything I can do to help

[–] [email protected] 2 points 3 weeks ago

Let me know if I can help too.

[–] [email protected] 2 points 3 weeks ago (2 children)

This always fails for me: https://fedia.io/ecf/7236913?choice=1

Normally, if I refresh a page once and immediately vote, it works. In this case, it has never worked.

This happens periodically and it does not seem to be specific to any instance (I've seen across posts from several both in terms of the OP or the instance of the commenter).

My gut says potentially issues with timezone somewhere and my offset (UTC+9) is potentially far enough out that it's an issue. I have no evidence for that. Looking at the request and response in dev tools hasn't yielded anything particularly useful so far as I can tell.

[–] [email protected] 4 points 3 weeks ago (2 children)

I moved fedia.io away from fastly. I have a nagging feeling it has something to do with fastly. Can you let me know if you continue to see this?

[–] [email protected] 2 points 3 weeks ago (1 children)

I found:

[2024-09-12T20:42:54.414611+02:00] request.ERROR: Uncaught PHP Exception Symfony\Component\HttpKernel\Exception\BadRequestHttpException: "Invalid CSRF token" at AbstractController.php line 39 {"exception":"[object] (Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException(code: 0): Invalid CSRF token at /var/www/kbin.melroy.org/html/src/Controller/AbstractController.php:39)
[stacktrace]
#0 /var/www/kbin.melroy.org/html/src/Controller/FavouriteController.php(24): App\\Controller\\AbstractController->validateCsrf()
#1 /var/www/kbin.melroy.org/html/vendor/symfony/http-kernel/HttpKernel.php(183): App\\Controller\\FavouriteController->__invoke()
#2 /var/www/kbin.melroy.org/html/vendor/symfony/http-kernel/HttpKernel.php(76): Symfony\\Component\\HttpKernel\\HttpKernel->handleRaw()
#3 /var/www/kbin.melroy.org/html/vendor/symfony/http-kernel/Kernel.php(182): Symfony\\Component\\HttpKernel\\HttpKernel->handle()
#4 /var/www/kbin.melroy.org/html/vendor/symfony/runtime/Runner/Symfony/HttpKernelRunner.php(35): Symfony\\Component\\HttpKernel\\Kernel->handle()
#5 /var/www/kbin.melroy.org/html/vendor/autoload_runtime.php(29): Symfony\\Component\\Runtime\\Runner\\Symfony\\HttpKernelRunner->run()
#6 /var/www/kbin.melroy.org/html/public/index.php(7): require_once('...')
#7 {main}
"} []

And you found:

{"message":"Uncaught PHP Exception Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException: \"Invalid CSRF token\" at AbstractController.php line 39","context":{"exception":{"class":"Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException","message":"Invalid CSRF token","code":0,"file":"/var/www/mbin/src/Controller/AbstractController.php:39"}},"level":400,"level_name":"ERROR","channel":"request","datetime":"2024-09-12T18:54:45.620576+00:00","extra":{}}
{"message":"Uncaught PHP Exception Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException: \"Invalid CSRF token\" at AbstractController.php line 39","context":{"exception":{"class":"Symfony\\Component\\HttpKernel\\Exception\\BadRequestHttpException","message":"Invalid CSRF token","code":0,"file":"/var/www/mbin/src/Controller/AbstractController.php:39"}},"level":400,"level_name":"ERROR","channel":"request","datetime":"2024-09-12T18:54:45.803347+00:00","extra":{}}

Not sure yet what the root-cause is. But it's on our radar now.

[–] [email protected] 3 points 3 weeks ago (1 children)

Y'all are great. Feel free to ask if you need me to try anything. I haven't touched PHP in years, but I am a software engineer, so feel free to be as technical as you'd like.

[–] [email protected] 3 points 3 weeks ago (1 children)

We can definitively use more developers. We are currently with only two: me and bentigorlich (recently debounced left as well as e-five). I also didn't use Symfony (the PHP framework behind it), but I now also got those skills in place.. So no worries, we are happy to help you. You can join us at Matrix, so it's easier to chat and discuss: Mbin Matrix space. I hope to see you there!

EDIT: GitHub repo is at: https://github.com/MbinOrg/mbin

[–] [email protected] 2 points 3 weeks ago (1 children)

Sorry you also went through this: -> kbin.social (died) -> kbin.run (died) -> fedia. Kbin.run was the instance of debounced, mentioned earlier..

[–] [email protected] 3 points 3 weeks ago (1 children)

This annoys me about the fediverse - people take a chance on coming here and then repeatedly get left in the dark when their instance is shut down. That's why I was so very happy when you and others helped me get fedia.io back to healthy.

[–] [email protected] 1 points 3 weeks ago

Agreed. This is also why I didn't (yet) rename kbin.melroy.org to mbin.melroy.org. And also created: https://github.com/MbinOrg/mbin/issues/1126

[–] [email protected] 2 points 3 weeks ago (2 children)

Still getting it very frequently. Sometimes no amount of refreshing will allow me to vote on something. Here's the latest URL: https://fedia.io/ef/1184232?choice=1

[–] [email protected] 2 points 3 weeks ago (2 children)

For now try Firefox or a fork: Floorp, LibreWolf, etc. I heard that works better.. I know this isn't the solution, but that is the best workaround atm.

[–] [email protected] 2 points 3 weeks ago

I only use Firefox at this time, so that shouldn't be it.

[–] [email protected] 2 points 3 weeks ago (1 children)

Most interesting: the problem had only been happening on MS Edge on my laptop. I have been using safari on my phone without issue. Just a bit ago, i refreshed the page and now every time I revisit the site, I have to log back in, just like on Edge. It’s like the old session expired and the new ones aren’t sticking. I’ll try FF on my phone.

Note: even in the time I started typing this reply to when I hit the “add comment” button, I got logged out

[–] [email protected] 1 points 3 weeks ago (2 children)

Note: even in the time I started typing this reply to when I hit the “add comment” button, I got logged out

That is really bad indeed. And the only error you see on the server side is only "Invalid CSRF token"?

[–] [email protected] 2 points 3 weeks ago (1 children)

ok - I just had it happen again while looking at logs. interestingly, there was NOT a CSRF log when that happened. There were a bunch of other errors, but enough that I could look through all of them and see that they were all related to activitypub issues - signaturevalidator and the like

[–] [email protected] 1 points 3 weeks ago (1 children)

I really hope it's not a session issue with Valkey or something (I don't think so..). We are now just going deep into this issue I think. Both sessions & csrf. Since I notice already some weird config issues with csrf forms

[–] [email protected] 2 points 3 weeks ago (1 children)

FYI. Reading: https://symfony.com/doc/7.2/security/csrf.html#installation

The tokens used for CSRF protection are meant to be different for every user and they are stored in the session. That's why a session is started automatically as soon as you render a form with CSRF protection.

Moreover, this means that you cannot fully cache pages that include CSRF protected forms. As an alternative, you can:

  • Embed the form inside an uncached ESI fragment and cache the rest of the page contents;
  • Cache the entire page and load the form via an uncached AJAX request;
  • Cache the entire page and use hinclude.js to load the CSRF token with an uncached AJAX request and replace the form field value with it.
[–] [email protected] 1 points 3 weeks ago (1 children)

So we might cache too much in Mbin.. Including the comments (vote forms)... oopsy?

[–] [email protected] 1 points 3 weeks ago (1 children)

Or remove.. CSRF protection and keep the cache.. It's a trade-off.. @[email protected] How much protection does CSRF on these forms really gives the user? I'm "just" the software engineer, you are the SecOps expert here... I mean how likely is it really that sites are doing a Cross-Site Request Forgery ...

[–] [email protected] 3 points 3 weeks ago (1 children)

it's hard to make a blanket statement, because it depends on the details of the application. CSRF attacks are definitely real and common, but using csrf tokens isn't critical in every application. For example, I think we have CORS headers enabled, I don't think we have functionality that allows embedded iframes, but we do allow links - if we have administrative functions that can be triggered solely with GET parameters, then someone could trick an administrator into doing something that caused damage by clicking on a link in a post. The only one that would obviously work that I can see is "logout", which would be annoying, but not world ending, and would work for everyone, not just administrators.

[–] [email protected] 1 points 3 weeks ago

Thanks. I see. I do see the importance for login & logout forms having CSRF. But it does seems a bit overkill to have it on upvotes, boost and alike.. I could be wrong.

[–] [email protected] 1 points 3 weeks ago (1 children)

I have so many errors in prod.log that it's hard to tell for certain, but when I try to filter out those that are associated with failed federation events, that seems to be when I'm left with. I am trying again to see if I can confirm

[–] [email protected] 1 points 3 weeks ago (2 children)
[–] [email protected] 1 points 3 weeks ago (1 children)

I do not have 2fa turned on right now.

[–] [email protected] 1 points 3 weeks ago (1 children)

OK, that rules out at least the 2FA code. Thanks for letting me know. So what is your password ;P?

[–] [email protected] 1 points 3 weeks ago (1 children)

Indeed. I am trying to get it to happen again now that I’ve got the logs filtered down to a manageable level.

[–] [email protected] 2 points 3 weeks ago

If you want to know.. We did try to clean-up all those errors/warnings from the log and fix some of the issues in the main branch: https://github.com/MbinOrg/mbin/commits/main/.. We are not there yet obviously. But 1.7.x is now focusing on making Mbin more stable. @[email protected] is helping out as well here.

[–] [email protected] 1 points 3 weeks ago (1 children)

Could you join the conversation here? https://github.com/MbinOrg/mbin/pull/1130. We really are trying hard to debug this issue. Both CSRF form issue as well as log out issue.

[–] [email protected] 1 points 3 weeks ago

Will do. This morning I have work to do outside.

I will also note that there are three patterns when I post a comment that may or may not be related:

  • it just publishes when I hit the button
  • I hit the button, it thinks for a second, and then the button is intractable again. Pushing it again works so far in every case (i.e. it seems something goes wrong but no UI error. I haven't had dev tools open to see what happens there. This feels like it took to long for me to reply in some cases, but not all).
  • I hit post and get moved to a new page which is just my post with a preview. I'm not sure if this is just how it works with certain sites or something or also related.
[–] [email protected] 2 points 3 weeks ago

We need server error logs. So when such a problem happens. And you can fully replicate the issue. I hope you can test it with @[email protected] and see if there is some error log at the server side happening as well.

That allows us (developers) to find hopefully the root-cause of this issue. If it's still present.

[–] [email protected] 1 points 1 month ago

It might only be with certain instances. I just noticed it wasn't happening on a lemmy.world post I'd had open for a while. It could also have been something temporary. I'll try to sport/report any patterns.