Selfhosted

41662 readers

853 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

Question about what to put on RAID and what to put on NVME (self.selfhosted)

submitted 2 days ago* (last edited 2 days ago) by ch00f to c/selfhosted

51 comments fedilink hide all child comments

Since 2016, I've had a fileserver mostly just for backups. System is on 1 drive, RAID6 for files, and semi-annual cold backup.

I was playing with Photoprism, and their docs say "we recommend placing the storage folder on a local SSD drive for best performance." In this case, the storage folder holds basically everything but the pictures themselves such as the database files.

Up until now, if I lost any database files, it was just a matter of rebuilding them by re-indexing my photos or whatever, but I'm looking for something more robust since I'll have some friends/family using Pixelfed, Matrix, etc.

So my question is: Is it a valid strategy to keep database files on the SSD with some kind of nightly backup to RAID, or should I just store the whole lot on the RAID from the get go? Or does it even matter if all of these databases can fit in RAM anyway?

edit: I'm just now learning of ZFS caching which might be my answer.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 2 days ago (1 children)

Alternatively, if your databases are on a filesystem that supports snapshots (LVM, btrfs or ZFS for instance), you can make a snapshot of the filesystem, mount the snapshot and backup thame database from it. This will ensure the backup is consistent with itself (the backed up directory was not written to between the beginning and the end of the backup)

[–] ch00f 2 points 2 days ago (1 children)

Doesn't this just pass the issue to when the snapshot is made? If the snapshot is created mid-database update, won't you have the same problem?

[–] [email protected] 2 points 21 hours ago* (last edited 21 hours ago) (1 children)

No, because the DBMS is going to be designed to permit power loss in the middle of a write without being corrupted. It'll do something vaguely like this, if you are, for example, overwriting an existing record with a new one:

Write that you are going to make a change in a way that does not affect existing data.
Perform a barrier operation (which could amount to just syncing to disk, or could just tell the OS's disk cache system to place some restrictions on how it later syncs to disk, but in any event will ensure that all writes prior to to the barrier operation are on disk prior to those write operations subsequent to it).
Replace the existing record. This may be destructive of existing data.
Potentially remove the data written in Step 1, depending upon database format.

If the DBMS loses power and comes back up, if the data from Step #1 is present and complete, it'll consider the operation committed, and simply continue the steps from there. If Step 1 is only partially on disk, it'll consider it not committed and delete it, treat the commit as not having yet gone through. From the DBMS's standpoint, either the change happens as a whole or does not happen at all.

That works fine for power loss or if a filesystem is snapshotted at an instant in time. Seeing a partial commit, as long as the DBMS's view of the system was at an instant in time, is fine; if you start it up against that state, it will either treat the change as complete and committed or throw out an incomplete commit.

However, if you are a backup program and happily reading the contents of a file, you may be reading a database file with no synchronization, and may wind up with bits of one or multiple commits as the backup program reads the the file and the DBMS writes to it -- a corrupt database after the backup is restored.

[–] ch00f 1 points 21 hours ago

Very good to know! Thanks.