this post was submitted on 12 Aug 2023

19 points (95.2% liked)

Selfhosted

40402 readers

815 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

ZFS dataset configuration for a movies and tv shows library? Very heterogeneous data (lemmy.zip)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/selfhosted

10 comments fedilink hide all child comments

Hi y'all,

I am exploring TrueNAS and configuring some ZFS datasets. As ZFS provides with some parameters to fine-tune its setup to the type of data, I was thinking it would be good to take advantage of it. So I'm here with the simple task of choosing the appropriate "record size".

Initially I thought, well this is simple, the dataset is meant to store videos, movies, tv shows for a jellyfin docker container, so in general large files and a record size of 1M sounds like a good idea (as suggested in Jim Salter's cheatsheet).

Out of curiosity, I ran Wendell's magic command from level1 tech to get a sense for the file size distribution:

find . -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'

Turns out, that's when I discovered it was not as simple. The directory is obviously filled with videos, but also tiny small files, for subtitiles, NFOs, and small illustration images, valuable for Jellyfin's media organization.

That's where I'm at. The way I see it, there are several options:

1. Let's not overcomplicate it, just run with the default 64K ZFS dataset recordsize and roll with it. It won't be such a big deal.
1. Let's try to be clever about it, make 2 datasets, one with a recordsize of 4K for the small files and one with a recordsize of 1M for the videos, then select one as the "main" dataset and use symbolic links for each file to the other dataset such that all content is "visible" from within one file structure. I haven't dug too much in how I would automate it, but might not play nicely with the *arr suite? Perhaps overly complicated...
1. Make all video files MKV files, embed the subtitles, rename the videos to make NFOs as unnecessary as possible for movies and tv shows (though this will still be useful for private videos, or YT downloads etc)
1. Other?

So what do you think? And also, how have your personally set it up? Would love to get some feedback, especially if you are also using ZFS and have a videos library with a dedicated dataset. Thanks!

Edit: Alright, so I found the following post by Jim Salter which goes through more detail regarding record size. It clarifies my misconception about recordsize not being the same as the block size, but also it can easily be changed at any time. It's just the size of the chunks of data to be read. So I'll be sticking to 1M recordsize and leave it at that despite having multiple smaller files, because the important will be to effectively stream the larger files. Thank you all!

top 10 comments

sorted by: hot top controversial new old

[–] [email protected] 3 points 1 year ago (1 children)

Have a video dataset with 1m recordsize, primarycache=metadata, secondarycache=metadata, and a general dataset as parent with 128kb recordsize, primarycache=secondarycache=normal, compression=lzma or lz44 or something.

Works like a monster, I don't worry about things like srts and such, though your symlinks idea looks interesting.

I'm reworking my entire system to get off the filesystem structure anyway and use python and some other dB possibly reading from sonarr for metadata seeding, but haven't got to it yet.

Actually, you make a good point, what would be nice is if sonarr put nfos in a different structure, but since I'm going to read sonarr metadata I can just delete them anyway.

[–] [email protected] 2 points 1 year ago (1 children)

Do you have it set at 1M for a situation similar to mine or do you not have any small files for your video files? Setting it at 1M is indeed possible, though it would uselessly consume a large amount of extra disk space as all files of just a few KB would automatically require a whole 1MB disk space from my understanding.

[–] [email protected] 2 points 1 year ago

Similar to yours, I originally didn't have many small files, but I turned on sonarr metadata and now there are tons of 1k files everywhere.

I think zfs keeps them compacted though.

So far, this seems pretty simple: set volblocksize=64K, you get 64KiB blocks in your zvol, and that’s that. But recordsize is a bit trickier: the blocks in a dataset are dynamically sized, and recordsize sets the maximum size for blocks in that dataset—not a fixed size.

https://klarasystems.com/articles/tuning-recordsize-in-openzfs/

So I wasn't worried about the small files in the beginning, the major reason to have smaller recordsize is if you want to make small accesses within a file, not if you want to access small files.

[–] Spectator47 2 points 1 year ago (1 children)

Recordsize sets the maximum it can be for a file.

Either leave it at the zfs default of 128k or since your use case involves primarily reads of large files you could set it to 1MB.

[–] [email protected] 3 points 1 year ago (1 children)

Setting it at 1MB is also possible, though it would uselessly consume a large amount of extra disk space as all files of just a few KB would automatically require a whole 1MB disk space from my understanding. And as there are usually multiple tiny files for each video, it could end up growing into something quite unnecessarily large...

[–] Spectator47 1 points 1 year ago

The recordsize used is dynamic up to a maximum of the zfs recordsize property. See https://klarasystems.com/articles/tuning-recordsize-in-openzfs/

The purpose of tuning it is about optimising reads and writes within files larger than the recordsize. For example if you have a database that typically stores its data in a single large file then because it is a large file zfs will be reading and writing it in recordsize chunks on disk. If your databse operates on 4k size changes then reading and writing 1MB at a time in disk is a waste of I/O bandwidth.

[–] [email protected] 2 points 1 year ago (1 children)

I just left the defaults and I've never had problems.

[–] [email protected] 1 points 1 year ago

Yes I don't think there could be an issue with non optimal value, it has more to do with leaving IOPS on the table. I might be too concerned about it when it might not be that important.

[–] InverseParallax 2 points 1 year ago

Let me clarify:

Recordsize is basically hash block size. If you want to change things you will always write in blocks up to the recordsize, smaller if the file is smaller, then calculate the hash based on that.

Smaller only helps for randomish accesses inside a file.