this post was submitted on 04 Sep 2024
21 points (95.7% liked)

datahoarder

6807 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS
 

Years ago I came across filecoin/sia decentralized data storage and I started trying them but then I stopped due to lack of time. Some days ago I've heard in a podcast about a kind of NAS that does kinda the same thing: it spreads chunks of data across other devices owned by other users.

Is there a service that does this but with your own hardware or, even better, something open source where you can have X GB as far as you share the same amount of space plus something extra?

It would be great for backup.

top 15 comments
sorted by: hot top controversial new old
[–] [email protected] 8 points 3 months ago* (last edited 3 months ago) (2 children)

https://ipfs.tech/

I think this is the main technology behind that and it is open source... I heard something about it years ago too. I've similarly never used it and am curious now that you mention it if anyone has. I'm unsure how to actually "use" ipfs and/or what tools might use it.

I'm kind of inclined to believe it doesn't work (or doesn't work well) otherwise it probably would be a bigger deal by now and there would be a lot to show off on the ipfs website.

Edit: It looks like this provides S3 compatible storage to IPFS. However, it seems more expensive than B2... So I'm not really sure why one would use it. You'd think IPFS would be attempting to undercut traditional providers.

[–] [email protected] 8 points 3 months ago (1 children)

Note: every file on Ipfs is unencrypted and semi-public unless you encrypt before upload.

[–] [email protected] 4 points 3 months ago

Should be standard operating procedure anyway...

[–] [email protected] 1 points 3 months ago

Is a local ipfs cluster perhaps the best way here, or does that also connect itself to the global ipfs? https://docs.ipfs.tech/install/server-infrastructure/#features

[–] [email protected] 5 points 3 months ago (1 children)

Not sure if it's quite what you're looking for but the first thing that comes to mind for me is Ceph. It's not exactly a service in and of itself but it is self hostable/open source. I currently have a cluster set up across 3 machines with 87 TiB total space and a Ceph Filesystem and thought it was relatively easy to set up (I'd recommend first doing so in a virtual environment to get your bearings if possible, though). Said filesystem is set to make 3 replicas (1 on each machine) of any data written to it and I use Unison to sync files between local storage and the cluster (such that the whole setup is analogous to Windows/Onedrive or MacOS/iCloud). I also plan on setting up a node at my parents' house and making a new replication rule for that. As they live in a different state than me, this would amount to having hot offsite backups (for both me and them). Finally, while I haven't seen it done in practice, in theory, multiple Ceph admins may be able to configure a multi-site setup where they could trade some space on their own clusters with each other for a sort of community based storage pool/hot offsite backups (like a community ran version of Google Drive or similar where the buy-in could be some of your own storage space or money). However, it's important to note that while communications are encrypted and the storage drives can be encrypted as part of setup and operation, any data written to a cluster is not automatically encrypted and if one wants privacy, said data would need to be encrypted separately before writing it to a community cluster.

[–] peregus 2 points 3 months ago

Thanks for sharing your situation/ideas!

[–] [email protected] 3 points 3 months ago (1 children)

I use Sia as a renter (paying for storage) and I am also a host (selling storage space) on the Sia network.

Currently use it to host videos from peertube.wtf. It’s not perfect yet, but it works.

[–] peregus 2 points 3 months ago (1 children)

For how long have you being using it? How's the reliability?

[–] [email protected] 3 points 3 months ago (1 children)

I have been using it for about 6 months now. no issues.

I currently only use 163.92 GB @ 4.2x of storage though and renting out about 2TB of storage.

It is actually fairly easy to setup. Both hosting and renting.

[–] peregus 2 points 3 months ago (1 children)

What do you mean with "@4.2x"?

[–] [email protected] 4 points 3 months ago (1 children)

It’s the redundancy. So a file exists at 4.2 hosts on average. The minimum is 3, which is something you set yourself.

[–] peregus 2 points 3 months ago

Got it, thanks!

[–] [email protected] 2 points 3 months ago (1 children)

I used to use some of those services a little bit, but it's very expensive and generally very slow compared to just using B2, Wasabi, etc..

As far as a more local solution there are tons of those like Ceph, MinIO, GlusterFS, Garage, and many more.

[–] peregus 1 points 3 months ago* (last edited 3 months ago) (1 children)

I use Wasabi too, I was thinking about having another option with data spread across different devices, with extra devices for safety (like SIA). Thanks for your point of view and for the suggestions!

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

Multiple backups are good (really 2 is the minimum anyone should have). It's just the added complexity and overhead of distributed systems doesn't seem worth it to me, I have a local backup on a disk mirror in my NAS, and an online backup to B2 and that feels like good enough so far.