tl;dr summary furry.engineer and pawb.fun will be down for several hours this evening (5 PM Mountain Time onward) as we migrate data from the cloud to local storage. We'll post updates via our announcements channel at https://t.me/pawbsocial.
In order to reduce costs and expand our storage pool, we'll be migrating data from our existing Cloudflare R2 buckets to local replicated network storage, and from Proxmox-based LXC containers to Kubernetes pods.
Currently, according to Mastodon, we're using about 1 TB of media storage, but according to Cloudflare, we're using near 6 TB. This appears to be due to Cloudflare R2's implementation of the underlying S3 protocol that Mastodon uses for cloud-based media storage, which is preventing Mastodon from properly cleaning up no longer used files.
As part of the move, we'll be creating / using new Docker-based images for Glitch-SOC (the fork of Mastodon we use) and hooking that up to a dedicated set of database nodes and replicated storage through Longhorn. This should allow us to seamlessly move the instances from one Kubernetes node to another for performing routine hardware and system maintenance without taking the instances offline.
We're planning to roll out the changes in several stages:
-
Taking furry.engineer and pawb.fun down for maintenance to prevent additional media being created.
-
Initiating a transfer from R2 to the new local replicated network storage for locally generated user content first, then remote media. (This will happen in parallel to the other stages, so some media may be unavailable until the transfer fully completes).
-
Exporting and re-importing the databases from their LXC containers to the new dedicated database servers.
-
Creating and deploying the new Kubernetes pods, and bringing one of the two instances back online, pointing at the new database and storage.
-
Monitoring for any media-related issues, and bringing the second instance back online.
We'll be beginning the maintenance window at 5 PM Mountain Time (4 PM Pacific Time) and have no ETA at this time. We'll provide updates through our existing Telegram announcements channel at https://t.me/pawbsocial.
During this maintenance window, furry.engineer and pawb.fun will be unavailable until the maintenance concluded. Our Lemmy instance at pawb.social will remain online, though you may experience longer than normal load times due to high network traffic.
Finally and most importantly, I want to thank those who have been donating through our Ko-Fi page as this has allowed us to build up a small war chest to make this transfer possible through both new hardware and the inevitable data export fees we'll face bringing content down from Cloudflare R2.
Going forward, we're looking into providing additional fediverse services (such as Pixelfed) and extending our data retention length to allow us to maintain more content for longer, but none of this would be possible if it weren't for your generous donations.
Will the data be synched or backed up off site?
Yes, we'll be maintaining:
Local hardware horse here!
To elaborate a bit, the storage replicas will span three physical servers in realtime, all of which get snapshots hourly in case we need a rollback, and full backups weekly to a fourth system on mechanical drives with 2-disk failure tolerance. This should mean that data loss requires 4 simultaneous system failures.
We have a tape library for automated tape backups, but can’t afford a drive upgrade just yet to make it make sense. The drives are often several thousand dollars, but the tape media is cheap.
Offsite backups are currently in the works, though if anyone has recommendations I would love to add them to our list for consideration.
If anyone has additional questions or suggestions I would be happy to answer tomorrow!