this post was submitted on 15 Nov 2023
64 points (100.0% liked)

Asklemmy

43495 readers
3038 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy πŸ”

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_[email protected]~

founded 5 years ago
MODERATORS
 

I just need to preserve some old data that I have on my computers, so I was wondering what would be the best way to archive stuff long term.

Blu-ray disks ? Multiple HDDs ? What do you guys suggest ?

top 50 comments
sorted by: hot top controversial new old
[–] nottelling 34 points 10 months ago* (last edited 10 months ago) (9 children)

Self hosting principals aside, is this data actually important? If so, then don't fuck around with self hosting it. Are you looking for lowest cost? Then don't waste a bunch of money spinning your own disks.

Amazon glacier to guarantee availability and your own encryption to guarantee privacy.

It's currently running me about $4/month for around 10tb that I don't want to lose but just don't want to deal with. An equivalent HDD solution would be around $500, that's 10 years to break even assuming zero disk failures and zero personal maintenance time.

Plus it's guaranteed. Inherent multiple copies, has SLA, and there's no worry about the service just disappearing. It's they decide to shut down or raise prices or whatever, you can reevaluate and move.

Edit: Glacier and similar services are meant for archival which is the term OP used. You never expect to need it again, but can't get rid of it. Retrieval cost is mostly irrelevant, but yes much more expensive. (I'd wager still less expensive than a home RAID array.)

[–] Dran_Arcana 8 points 10 months ago* (last edited 10 months ago) (3 children)

What would it cost to retrieve though? You probably still have the appropriate cost-effective solution but it's an important consideration for newcomers to have complete math.

[–] [email protected] 3 points 10 months ago* (last edited 10 months ago)

Retrieving from S3 glacier is approximately ~~10 times the monthly cost of storing the data~~ 100 times actually. Didn't realize retrieval from Glacier isn't actually downloading it onto your local, but rather just moving it into a frequent access tier S3 bucket from which you can then download, and this download is the expensive part.

[–] [email protected] 2 points 10 months ago

Yeah AWS charges for outgoing data, but not incoming. Keeping that data there is cheap, getting it back would not be.

[–] nottelling 2 points 10 months ago

OP said "archive", not "backup". Glacier is for days you need to keep but rarely touch.

[–] [email protected] 4 points 10 months ago (1 children)

How is it to get the data back?
Can I do it in real time so I could mount it as a media storage or would I need to rent one of the faster S3 tiers?

load more comments (1 replies)
[–] [email protected] 4 points 10 months ago (3 children)

So instead of "fucking around" with putting it on a long lasting storage device to keep in a wardrobe, he should give up control of the data, hand it to a company and risk forgetting to inform them about an adress change, so everything is lost, when the bills arent paid?

How is that more secure?

load more comments (3 replies)
[–] [email protected] 2 points 10 months ago (1 children)

Are you sure it's $4/month and not $40/month? If so, which region is this in?

[–] nottelling 2 points 10 months ago (2 children)

Us-East. Look specifically at glacier, which is long term, near free to store, expensive to remove.

load more comments (2 replies)
load more comments (5 replies)
[–] Donebrach 12 points 10 months ago (1 children)

encode your data into the dna of an alien world so that it lasts for all time.

[–] [email protected] 5 points 10 months ago

Even better, so it mutates into superior data!

[–] 9point6 8 points 10 months ago* (last edited 10 months ago)

Depends how important the data is, how long is long term and how budget is budget, but assuming you don't want to risk losing anything, backup best practice is the 321 rule

3 copies 2 different media 1 off-site

I'd almost always say a cloud provider for your off-site backup, but if you don't want to do that, it depends how much you want to spend.

There's no guaranteed do-it-once-and-you're-done approach here, as all data can degrade. For instance if one of your backup media is hard disks, you're probably going to want it setup in at least RAID 5 and you want to be on top of swapping out disks when they fail. If you're thinking of the Blu-ray or tape approach, you're going to want to periodically check that the media hasn't degraded. You'll probably also want to plan to replace the backup media every half decade or so to be extra safe (e.g. BD-Rs have a lifespan of 5-10 years).

[–] foggy 8 points 10 months ago* (last edited 10 months ago) (3 children)

Buy used HDDs and configure RAID arrays.

You can get like 32TB of cheap basically ready to fail HDDs and listen to them click away for dirt cheap. I mean like a couple hundred bucks.

Buy some old tower PCs from a school or something. Low specs are fine.

Install Ubuntu server, set up samba and minidlna set up tunnel with cloudflare. Boom.

You could set this up for like $500. I have a setup like this. My HDD has been clicking since install. 2 years strong. I have two backup 16TB HDDs ready to hot swap should either of them fail. Having those backups on hand brings your cost up to about $800, but again, this is for two 16TB HDDs in a RAID config. If you did like 8TB instead, this is all probably $500 with backups.

Western Digital has a bargain bin.

P.S. y'all have valid concerns about RAID5, I'm no expert so I mirror the whole of it to another 16tb USB HDD I got for anfew hundred bucks.

It's basically everything but flood/fire proof. The odds of my RAID config failing such that It is irrecoverable is low, but the odds my USB HDD fails at the same time is inconceivable! This USB HDD is attached to my main workstation hub, when I log in to windows (shut up I dual boot for development ya dinguses) and my machine is idle for 20 mins, it performs a sync between my network area storage (my ghetto RAID server) and the USB HDD.

It's low cost and fool proof, and kinda beefy. If I upgrade the tower to something modern...? Add 1TB of SSD to everything? Dude... I'll be rocking a rad setup for under 1500.

[–] [email protected] 7 points 10 months ago (2 children)

Just use raid6 instead of raid5 as ready to die disks could die simultaneously

[–] [email protected] 5 points 10 months ago (2 children)

I once lost a RAID6 to a faulty power distributor in a server cause (lost 5 out of 12 disks). RAID is not a backup.

[–] [email protected] 2 points 10 months ago

But 1 disk failing and the array braking aint either.
This is about real time data not backup which should at best happen daily or bi-daily for really important data.

load more comments (1 replies)
[–] [email protected] 3 points 10 months ago (1 children)

After my experience with raid5 and the WD Green 2TB drives that were so fragile that the vibrations of 6 drives in the same case is enough to kill them resulting in 2 drives dying at same time wiping out my entire media collection...yeah, use raid6, with another server holding a raid6 array as continuous backup.

[–] [email protected] 1 points 10 months ago (3 children)

Read the data spec for how many in an array?
Literally the reason for WD RED NAS and NAS Pro (beyond some other tech specs).

load more comments (3 replies)
load more comments (2 replies)
[–] [email protected] 5 points 10 months ago* (last edited 10 months ago)

Would probably help to know for how long, how much capacity do you need and what budget. Should also be stated external factors play a massive factor on how long a storage device can survive like enviroment, humidity and heat being the biggies

Edit in case I fall asleep: for the budget I usually would go with an external ssd just refresh the data every year or 2 it should be ok for 8ish years maybe even 10. For a write it and forget it method you'll want m-disc instead which are more expensive but if properly stored will last lifetimes so the failure point will be a usuable drive that can read it. If you decide to go the spinning mechanical drive route make sure to buy 2 (a backup for the backup) since they are a lot more fragile. Gold plated dvds/cds are also another write and forget option but have less capacity than m-discs

[–] nomecks 5 points 10 months ago (2 children)

If you have enough data: Tapes. Tapes are so hilariously cheap to keep. Write them and keep them in a fire proof box. No power needed to keep platters spinning. 45TB/tape!

[–] [email protected] 1 points 10 months ago (1 children)

I have never used tapes, but I want to use it if it's viable. I only have about 3TB of data currently.

[–] [email protected] 1 points 10 months ago

Tapes only make financial sense if you're in the hundreds of TB.

[–] [email protected] 1 points 10 months ago

Personally I consider a tape only a valid solution in the 100+TB range. (At least cost wise) Unless you happen to have a tape drive already at your disposal..

[–] [email protected] 4 points 10 months ago (1 children)

A couple different threat models to consider, hardware failure vs human failure. Things like RAID can effectively cover the hardware failure side and be fully transparent. Human failure is a bit more tricky. There are a number of old expressions about backups but one that's good to keep in mind is snapshots are not backups. They're convenient and easy to automate but if the system making them goes kerplooie they're pretty useless.

A tiered version is good for off device backup, using diff backups routinely to only copy the new or changed data with a periodic full backup.

Cold disks are great but make sure to test them periodically, nothing worse than looking to restore a chunk of data only to find the backup can't be read.

[–] [email protected] 1 points 10 months ago (4 children)

Things like RAID can effectively cover the hardware failure side

Note that RAID only covers one specific hardware failure. To the point where IMHO, you cannot consider it a data security measure, only a data availability one.

load more comments (4 replies)
[–] [email protected] 3 points 10 months ago

CDs degrade over time and so aren’t the best way to archive data if you know you will need it again. If it’s just an β€˜in case’ then it may be ok. Best bet is to buy a USB disk and then keep a second copy of it offsite. Also best practice to not use two of the same manufacturer drive.

[–] calypsopub 3 points 10 months ago

Multiple methods, not really important which ones. I use an external hard drive plus I email zip files to myself.

[–] [email protected] 3 points 10 months ago (1 children)

How much data? The answer will differ if it's a few megs of text vs a terabyte of videos.

load more comments (1 replies)
load more comments
view more: next β€Ί