this post was submitted on 06 Jan 2025
8 points (90.0% liked)

my-place.social

32 readers
2 users here now

A place to discuss topics related to Friendica server https://my-place.social

founded 3 months ago
MODERATORS
 

I am so sorry for the very long downtime that my-place.social had this weekend. In over 2 years of running public servers, there has never been such a long outage.Note that it's going to take the server some time to catch up on missed messages. Also, the database has been restored to about 4 AM EST (9 UTC) from January 4th so there may be roughly 4 hours of information missing from the database.Here is what happened. Here is what I'm going to do in the future to make sure this doesn't happen again.WHAT HAPPENED?I made a snapshot of the instance and backed up the database. I ran the 6 command lines to do the Friendica update and I started Friendica. Instantly, the entire database became corrupted. Just about every table and every index was instantly corrupted.It was so bad that MariaDB stopped running and refused to start. Why is this unfortunate?It's impossible to look at, attempt repairs, or even restore from a backup if the database engine refuses to run. I did everything that I could to get back in. To do it, I had to discard indexes and other DB structures. I was hoping I could then find the tables and indexes that were broken, and repair them. But, eventually, I had to give up. Things were hopelessly broken. MariaDB just refused to run.I had to wipe out every trace of the database software, configuration, and the databases itself and reinstall the database from scratch, as if bringing up Friendica for the first time. I'm not a MariaDB expert. I eventually was able to do it, but a lot went wrong before I figured out everything that needed to do for the reset.Fortunately, I keep very detailed notes about every server I create, so I was able to go back and create the database, and configure it, the way it needs to be done for Friendica.Once the database engine came back up, and I had an empty working database, I began to restore from the last backup. I had no idea it would take 11 hours to restore a 36GB database. It restores by running SQL insert commands for every record in every table. Maybe there were a million commands? No idea.WHAT CAUSED IT?Seems something related to the upgrade, but perhaps a coincidence? I won't update again until I investigate. At this time, I don't know.WHAT ABOUT THE FUTURE?I kept extremely detailed notes about how to remove and restore the database engine, should I ever have this issue again. It will take under an hour next time. I also added another 50GB drive to the computer. Before doing an update, I will take an image backup of the database so if the database is damaged, I won't need to do a half-day restore. I can copy the image back in less than 5 minutes from the backup drive. This will also prevent any data loss.I suggest everyone follow @my_place_social from Friendica, Mastodon, etc., or join the !my_place_social community in PieFed, Lemmy, or MBIN. I post updates there. You can open an account on feddit.online and join the community from there so you can reach it, should my-place.social be down.Again, I am so sorry for this outage. I haven't had an opportunity to really play around to make sure everything is working. I hope all is well. Please let me know if you notice any issues. This was an extremely stressful weekend.Jerry#myplacesocial

top 5 comments
sorted by: hot top controversial new old
[โ€“] [email protected] 3 points 2 weeks ago (1 children)

@admin Thanks for all your hard work and updates about what was happening! Maintaining online services for people is so stressful, especially when routine housekeeping unexpectedly turns into an outage. Super grateful that you're a proactive admin.

[โ€“] [email protected] 1 points 2 weeks ago

@growfediverse Thank you so much! ๐Ÿ™‚

[โ€“] [email protected] 3 points 2 weeks ago (1 children)

@admin great job, that really sounded like a pretty bad situation

[โ€“] [email protected] 1 points 2 weeks ago (1 children)

@aliceif @admin Thank you. Yeah, it was pretty bad....

[โ€“] [email protected] 1 points 1 week ago

@admin @aliceif @Jerry

@aliceif @admin

Jerry - thanks so much, once again, for all your work on this, and your generosity in all your time and effort. Would donations towards you getting a redundant backup server possibly help for preventing as long of downtime in the future?