this post was submitted on 13 Oct 2024
77 points (98.7% liked)

Linux

48866 readers
651 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Over the years I have accumulated a sizable music library (mostly flacs, adding up to a bit less than 1TB) that I now want to reorganize (ie. gradually process with Musicbrainz Picard).

Since the music lives in my NAS, flacs are relatively big and my network speed is 1GB, I insalled on my computer a hdd I had laying around and replicated the whole library there; the idea being to work on local files and the sync them to the NAS.

I setup Syncthing for replication and... everything works, in theory.

In practice, Syncthing loves to rescan the whole library (given how long it takes, it must be reading all the data and computing checksums rather than just scanning the filesystem metadata - why on earth?) and that means my under-powered NAS (Celeron N3150) does nothing but rescanning the same files over and over.

Syncthing by default rescans directories every hour (again, why on earth?), but it still seem to rescan a whole lot even after I have set rescanIntervalS to 90 days (maybe it rescans once regardless when restarted?).

Anyway, I am looking into alternatives.
Are there any you would recommend? (FOSS please)

Notes:

  • I know I could just schedule a periodic rsync from my PC to the NAS, but I would prefer a bidirectional solution if possible (rsync is gonna be the last resort)
  • I read about unison, but I also read that it's not great with big jobs and that it too scans a lot
  • The disks on my NAS go to sleep after 10 minutes idle time and if possible I would prefer not waking them up all the time (which would most probably happen if I scheduled a periodic rsync job - the NAS has RAM to spare, but there's no guarantee it'll keep in cache all the data rsync needs)
top 37 comments
sorted by: hot top controversial new old
[–] [email protected] 33 points 2 months ago (2 children)

Syncthing should have inotify support which allows it to watch for changes rather than polling. Does that help?

[–] [email protected] 6 points 2 months ago

Yep, this is how I do it on my NAS, which is some RockPro64 board attached to WD Red spin drives. I have music, movies, game saves, documents, pics, etc. that equal around 1.5TB and I don't seem to get excess scanning when "watch files" is turned on.

[–] [email protected] 4 points 2 months ago (4 children)

Yes, Syncthing does watch for file changes... that's why I am so puzzled that it also does full rescans :)

Maybe they do that to catch changes that may have been made while syncthing was not running... it may make sense on mobies, where the OS like to kill processes willy-nilly, but IMHO not on a "real" computer

[–] [email protected] 6 points 2 months ago

Is it worth raising an issue with the project? Also enable logging to see if there are any clues as to why a rescan is being done?

[–] [email protected] 6 points 2 months ago

why I am so puzzled that it also does full rescans

Because they're not so foolish as to believe that inotify is infallible and complete.

[–] petersr 2 points 2 months ago

Or to catch if you start in a different OS and make changes to files that are then not tracked.

[–] [email protected] 2 points 2 months ago* (last edited 2 months ago)

You can set it to do full scans however often you like, even monthly.

[–] lung 21 points 2 months ago* (last edited 2 months ago) (3 children)

Nothing wrong with rsync, it's still kinda the shit. Short script, will do everything

https://git-annex.branchable.com/ this thing extends git to handling lots of big files. Probably a solid choice, haven't tried, but it claims to do exactly what you need, and even has ui and partial sync

[–] ouch 7 points 2 months ago

The use case sounds exactly like git-annex.

As a bonus you get a system that tracks how many copies of files and where you have them.

[–] IanTwenty 4 points 2 months ago

So git-annex should let you just pull down the files you want to work on, make your changes, then push them back upstream. No need to continuously sync entire collection. Requires some git knowledge and wading through git-annex docs but the walkthrough is a good place for an overview: https://git-annex.branchable.com/walkthrough/

[–] [email protected] 1 points 2 months ago

Very interesting! Saved

[–] [email protected] 11 points 2 months ago

Change the full-rescan interval to monthly, or yearly even, problem solved.

[–] [email protected] 10 points 2 months ago (1 children)

Why do you need the files in your local?
Is your network that slow?

I've heard of multiple content creators which have their video files in their NAS to share between their editors, and they work directly from the NAS.
Could you do the same? You'll be working with music, so the network traffic will be lower than with video.

If you do this you just need a way to mount the external directory, either with rclone or with sshfs.


The disks on my NAS go to sleep after 10 minutes idle time and if possible I would prefer not waking them up all the time

I think this is a good strategy to not put additional stress in your drives (as a non-expert of NAS), but I've read the actual wear and tear of the drives is mostly during this process of spinning up and down. That's why NAS drives should be kept spinning all the time.
And drives specifically built for NAS setups are designed with this in mind.

[–] [email protected] 4 points 2 months ago

I also read that drives should not be spun down and up too often, but I think it only matters if you do that hundreds of times a day?

Anyway, the reason I spin down my drives is to save electricity, and... more for the principle than for the electric bill (it's only 2 drives).

[–] [email protected] 5 points 2 months ago

You are correct that a reboot will trigger a full rescan. I'm always on the lookout for better sync. I just don't think it's out there right now for easy bidirectional sync.

Basically, if you want to set and forget, Syncthing is the best option. If you want more control, you'll need to look into setting up rsync scripts or similar, which will at least better let you control how often to sync.

[–] [email protected] 5 points 2 months ago (2 children)

I use Nextcloud to sync a huge music collection and it works great - I was kinda surprised

I run it on an underpowered NUC and it behaves beautifully

[–] [email protected] 2 points 2 months ago

That's funny since Nextcloud is a bloated mess.

[–] AbouBenAdhem 1 points 2 months ago

Another advantage of Nextcloud over Syncthing is selective syncing: Syncthing replicates the entire collection of synced files on each peer, but Nextcloud lets clients sync and unsync subfolders as needed while keeping all the files on the server. That could be useful for OP if they have a terabyte of files to sync but don’t have that much drive space to spare on every client.

[–] [email protected] 4 points 2 months ago* (last edited 2 months ago) (2 children)

I have a very similar setup to yours, a relatively large music library around 1.7TB of mostly flac files on my server. I'm able to organize these files locally from my laptop, which at various times has run either OSX, various GNU/Linuxes, or Windows. However I do not bother pushing the files themselves back and forth over the network.

Even if I did, I wouldn't automate the syncing, I'd only run it manually after I'd done my organizing with Picard for that day. After all, it the organization with Picard isn't automated, why should the syncing be? I'd probably use rsync for this.

In actual practice I do this: Connect to my server from my laptop using ssh, forwarding X. Run Picard on the actual server through this remote connection. Picard runs just fine over ssh. Opening a browser from a Picard tag for occasional Musicbrainz.org stuff is a little slower but works. I would then use a tmux or screen session to run the rsync command when I'm done with Picard for the day for syncing to a backup if necessary.

I don't really bother keeping a whole copy of my music collection locally on my laptop or phone though, since It's been bigger than is practical for a long time. Managing multiple libraries and keeping the two in sync turned into such a hassle that I was spending more time organizing than actually listening (or making mixtapes/playlists). To listen to my music locally I've used either Plex or Jellyfin, sometimes MPD (like when my server was directly connected to my stereo receiver), or just shared the folder via samba and NFS.

[–] [email protected] 4 points 2 months ago* (last edited 2 months ago) (1 children)

In the third paragraph you mentioned "tux" but I'm guessing that you meant "tmux". Just a clarification for readers not familiar with it and want to look it up.

[–] [email protected] 2 points 2 months ago

Yeah, that was a typo. Thanks, I'll fix it.

[–] [email protected] 1 points 2 months ago (1 children)

The main difference is probably that I have a desktop PC rather than a laptop (plus, a few old hard disks lying around).

I think I'll keep the local replica even when I'm finished reorganizing the library: the local copy doubles as a backup and I must say I am enjoying the faster access times.

[–] [email protected] 1 points 2 months ago

Oh yeah, I totally support the local copy. That will save you in times up hardware failure or fuck ups. I could just never keep up with the maintenance and kind of gave up making automatic backups and syncing. But reorganizing often translates to integrating deletions into rsync or whatever syncing protocol you use, and that has caused me headaches and heartaches.

[–] JTskulk 3 points 2 months ago

FreeFileSync detects moves and changes quickly without rereading the whole file. The first time you sync it will read every file to hash them first, this takes a long time but subsequent syncs will be fast.

[–] dr_jekell 1 points 2 months ago (1 children)

Have you had a look at "Lucky Backup"?

[–] [email protected] 3 points 2 months ago (2 children)

Never heard of it.... OMG that must be the worst name for a backup solution! :D

It reeks of abandoned software (last release is 0.50 from 2018), but there is recent activity in git, so... IDK

[–] dr_jekell 4 points 2 months ago* (last edited 2 months ago)

I wouldn't consider it a backup solution, I use Timeshift for that.

It's more of a file syncing software like Syncthing.

I have it set up to one way sync certain folders on my computer to an external USB HDD that I can disconnect and take with me if I have to evacuate.

[–] Ensign_Crab 3 points 2 months ago (1 children)

"Unlucky Backup" is probably worse.

[–] [email protected] 1 points 2 months ago

"Unreliable Backup" is probably the worse.

Wait, I have a better name. Duplicati!

[–] gi1242 1 points 2 months ago

I used seafile in the past. but I abandoned it for syncing. might help your use case ...

[–] [email protected] 1 points 2 months ago (2 children)

I suspect it's somewhat inevitable, since in order to sync you need to know what's the difference between files here and there (unless using smth like zfs send which should be able to send only the changes, I guess?). I'd probably tag everything at once and then sync

[–] [email protected] 2 points 2 months ago

I'm not an expert in that, But I also came to comment about something like zfs. I'm currently using btrfs and it works great for backup. The idea is that the filesystem itself contains the information what changed as far as I understand. And therefore the send operation are very lightway. As they are so lightway it would probbably be possible to also just do manuel sync(e.g. when you finished working for the day. However I can currently not think of abdirectional way where you have changes at both places and then merge them.

[–] iopq 2 points 2 months ago (1 children)

If a file has not been modified, why does it need to be scanned?

[–] [email protected] 1 points 2 months ago (1 children)

That's if you don't keep track of whether it was modified. It comes more or less for free if you're the filesystem, but may be more complicated for external programs. Although, ?maybe inotifywait can track for changes to the whole directory, but I'm not sure here

[–] iopq 1 points 2 months ago (1 children)

Isn't there a last modified time stamp on files?

[–] [email protected] 1 points 2 months ago

Huh, didn't think of that 😅

[–] [email protected] 1 points 2 months ago* (last edited 2 months ago)

Ad others have said, nextcloud won't rescan or reindex on a reboot. no idea why sync thing does, and surely there must be some way to disable that, too. I'm still hesitant to recommend NC as it's somewhat fragile, needs way more babying than I'm willing to keep up with and just does too many things, none of them anywhere close to "well". File sync on real computers works solidly if you have a reliable connection (don't get me started on Android).

Have you considered using a real media-hoster, like Jellyfin (or like a dozen others)? Jellyfin works fine for music (the are other music-only solutions though). There are plenty of clients that can stream, and have offline support (download a subset/albums/playlists) for things like laptops, phones, ... The server can usually transcode audio formats that a client can't play, in real-time, if needed.

Edit: I realize I wasn't clear as to what this means in practice. You essentially get a self-hosted Spotify. Your library, run from your server, optionally you can connect to it from anywhere.