this post was submitted on 09 Jan 2025
1953 points (98.3% liked)
Technology
60698 readers
6947 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Of course they aren't small, but they are probably as small as it gets, since they are pretty efficiently compressed. I am not sure what you mean by
since it is really trivial to use them. Just load them with Kiwix and serve them as a website. It doesn't get much easier than that.
I was referring to the file size being the barrier. The 2024 large database size of 202GB is prohibitive for the average person's resource capabilities. i.e. I have a home VPS host and I don't even have that much free space. Your cloud operating costs would also go up with the storage and bandwidth use.
There's also two separate issues I was kinda mixing up. I'm a developer who uses StackOverflow and would like to use a resource that is readily available. I think it'd take a few hours to setup even a smaller copy of SO, which isn't ideal for answering a quick question. I also don't want to setup a whole mirror site with custom work just for myself and because I'm paranoid Microsoft miight buy them and paywall SO overnight or something.
I looked into doing something similar with Wikipedia and the recommendation is also to use Kiwix, and the offline file size is also very large.
Welcome to the collapse! Hoarding "clean data" for personal use is like hoarding clean water and food: you need a place to keep it, and it starts going stale the minute you shelve it. So either buy a digital bunker to load up with what you need or ask the all knowing AI gods for answers like the other poors.
Also the Stack Exchange software used to be open source, surely there's still a fork somewhere. You could certainly run your own Developer QA site, but like with Lemmy, the problem then is getting enough traffic to be able to productively tap into the collective wisdom.
(Edit: sorry, this comes across mean spirited but I'm honestly sympathetic and just nihilisticallly frustrated to be in a similar situation. I foresee a big NAS and a lot of downloads in my future, but I hope we also find ways to share our forbidden knowledge until the day it can be free again)
I'm hoping community efforts are able to fill the void. I fear having to do this all myself and becoming some kind of Mad Max style tinkerer after the fall...
Old phones daisy changed together to act as a server... The remnants of StarLink for internet, getting nazi/rape threats from the remaining social media AI that live in all the satellites...
It would be nice if government backed up Wikipedia and SO. But considering they don't give a shit about Linux which is arguably one of the most vital technical infrastructure projects of our lifetime...
When hosting this locally, I don't see how 200 GB is much of an issue. Storage is so cheap these days, if you want to host it locally, just buy a 256 GB SSD just for that data for $20. Anyway, you were asking for a mirror, to which I replied with the information about the ZIM files. I don't really understand the issue. Stackoverflow just isn't that small, there is not much you can do about that.
The download? Maybe, depends on your Internet connection's speed. Actually serving it as a website certainly doesn't take hours. It is rather a matter of seconds.