this post was submitted on 23 Feb 2024
553 points (97.8% liked)

sh.itjust.works Main Community

7584 readers
2 users here now

Home of the sh.itjust.works instance.

Matrix

founded 1 year ago
MODERATORS
553
submitted 8 months ago* (last edited 8 months ago) by [email protected] to c/[email protected]
 

Hello sh.itjust.works community,

Many of you have been eager to get an update about when the sh.itjust.works instance will get it's upgrade to the latest version of lemmy. Here's a update along with a tentative timeline.

In December 2023 I purchased a new server for this community. It took me awhile but I eventually made the time to get it racked at the local datacenter. For the sysadmins lingering and those interested here are the specs:

  • Dual Xeon 2.9Ghz CPUs (32 cores total)
  • 256GB ram
  • 4 x 1TB SSD in raid 10 (with room to add 6 more disks)
  • 10gbit networking

While I'm ready to proceed with the upgrade, I've decided to first migrate this instance over to the new hardware. Here are two reasons.

  1. Those of you who have been around long enough may remember that I've been running this instance on "borrowed" unused resources that were available at the time. There are no more resources available for this instance to grow.
  2. There are reports that the latest version of lemmy may use more resources. Given we are among the bigger instances, should I end up in a situation where I need to increase resources to keep things fast I'll be restricted.

Here's the tentative timeline:

Task Date Expected Downtime
Migration to new server Tuesday February 27 2024 @ 8:00PM ET 90 Minutes
Upgrade to V19.3 Thursday February 29 2024 @ 8:00PM ET Up to 120 Minutes
  • If anything major goes wrong on the 27th I will revert back the changes and bring the instance back up on the current server.
  • If anything major goes wrong on the 29th I will revert back using an earlier snapshot. If that fails, I will restore from a backup.

During these two planned events those who want to provide moral support or who want to get periodic updates are more than welcome to join us on our matrix channel

=========================================================
Update February 29 2024
We've successfully completed the upgrade to v1.9.3. I'm happy to announce that we did it in an astonishing 27 minutes, a whole 93 minutes under what was expected. The extra leg work that was done over the last few weeks combined with the better hardware definitely played a part. Looking over the processes, it looks like the service responsible for images is still doing some work so it's possible that you will come across some broken images. I'll be keeping on eye on that over the next bit and make adjustments if needed. Thank you all for the support and to all of you who kept me company on our matrix channel. Have a good evening.

=========================================================
Update February 27 2024
We've successfully completed the migration. I'm happy to announce that this instance is now running on its new hardware dedicated solely to this community! We experienced just under 40 minutes of downtime which is a whole 50 minutes less than expected. Please give this instance a chance to catch up what it missed but we should be good within the next 30 or so minutes. Thank you

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 13 points 8 months ago (4 children)

Question. Why would you go with 1tb ssds instead of larger hdds? Isn't the space and price more important than the speed for this use?

You could get double the space (2tb hdd) for the same price as a 1 tb ssd.

Just wondering.

[–] [email protected] 27 points 8 months ago (1 children)

The biggest consumer of storage on this instance is related to the image hosting which we use an external object storage provider for. The second is the database which is no were near the 2TB capacity. 1TB SSDs are cheaper than 2TB SSDs and I also didn't want to spend more than I needed. As other mentioned if we need more space or IOPs in the future, I could accomplish this by adding more drives as a quick fix. This server does not support NVME unless I leverage its PCIe ports but I don't plan on doing that. By the time this instance gets to the point where 10 SSD drives just isn't cutting it anymore I'll probably have come across another opportunities on getting a new server with better NVME support.

[–] [email protected] 1 points 8 months ago

You let us know when you need some help with new NVMes. We're more than willing to contribute ;)

[–] [email protected] 20 points 8 months ago* (last edited 8 months ago) (1 children)

Speed is usually the reason. SSDs in general are faster, enterprise SSDs are not only faster but much more write-tolerant and last a very long time in comparison to consumer SSDs.

They can also (in many cases) do write caching at the speed of a DRAM buffer, making the bottleneck the SATA or SAS bus itself (SAS is like enterprise SATA, 12Gb/sec as opposed to 6). NVMe can be even faster. This means that programs (ie Lemmy and its database) that write data aren’t waiting around for the drive to acknowledge the write before that program can move on to other things. Shaving off a few milliseconds per write can make a massive difference when you realize there might be millions of IOPS (Input/Output operations Per Second) under load. The requirement for low latency is everything in servers.

When you are running a public service and requests are coming in constantly and at a high rate, you really really do not want storage latency to bottleneck you, as that is a problem that will compound extremely quickly. This is a big issue with HDDs as well, as even disk seek times add to the problem, let alone caching/buffering writes.

We could talk all day about if four SSDs in a RAID 10 are optimal, but sometimes you have to think about budget and complexity as well. For the load that a popular Lemmy instance might currently draw, I’d make an educated guess that this might be sufficient for now. Room to expand was also mentioned, which is the second most important part of a storage plan.

[–] [email protected] 2 points 8 months ago (1 children)

I'd wager raid 5 would be better, but it would require a special storage controller or hog the cpu with 4 ssds.

[–] [email protected] 2 points 8 months ago (1 children)

Software RAID is much faster than you think, even in RAID 5. Many of the algorithms used in software RAID leverage special CPU instructions that can process the parity operations at a very fast rate. Reading the data, which is by far the most common operation in a Lemmy instance, uses even less computational power than writes.

[–] [email protected] 2 points 8 months ago

Yeah, ZFS rocks these days. Fast and rock solid for me, even on older hardware. I run my whole array as mirrored vdevs (so, basically a bunch of raid 10) to keep resilver times down when i replace drives. No issues so far!

[–] [email protected] 7 points 8 months ago

Probably for faster loading times when playing games on the hardware during off peak hours jk jk lol

[–] [email protected] 6 points 8 months ago* (last edited 8 months ago)

Speed is important. All else equal, the database will work faster with SSDs. Raid also makes the storage be under heavier load so SSDs make even more sense here as well. You want response times to be as low as possible for a good user experience.

But also SSDs are kinda standard now and you can get a decent amount of storage for not that much higher price. Especially for server hardware that is more or less constantly under load, SSDs just make a lot more sense.