this post was submitted on 20 Jul 2024
54 points (96.6% liked)

Wikipedia

1660 readers
338 users here now

A place to share interesting articles from Wikipedia.

Rules:

Recommended:

founded 2 years ago
MODERATORS
 

cross-posted from: https://lemmy.world/post/17748238

TIL That the entirety of Wikipedia is only ~100Gb and you can download it for offline use

In light of the recent Crowdstrike crash revealing how weak points in IT infrastructure can have wide ranging effects, I figured this might be an interesting one.

The entirety of wikipedia is periodically uploaded here, along with many other useful wikis and How To websites (ex. iFixit tutorials and WikiHow): https://download.kiwix.org/zim

You select the archive you want, then the language and archive version (for example, you can get an archive with no pictures, to save on space). For the totality of the english wikipedia you'd select the "wikipedia_en_all_maxi_2024-01.zim"

The archives are packed as .zim files, which can be read with the Kiwix app completely offline.

I have several USBs I keep that have some of these archives along with the app installer. In the event of some major catastrophe I'd at least be able to access some potentially useful information. I have no stake in Kiwix, and don't know if there are other alternative apps and schemes, just thought it was neat.

top 7 comments
sorted by: hot top controversial new old
[–] seaQueue 16 points 5 months ago

Compression ratios on plaintext are magical

[–] [email protected] 6 points 5 months ago (1 children)

Hi, unwanted internet pedant here. Gb is gigabits. GB is gigabytes. All of English Wikipedia compressed is about 100 GB.

[–] thirteene 2 points 5 months ago (1 children)
[–] [email protected] 1 points 5 months ago

Gigaunitless

[–] [email protected] 5 points 5 months ago (1 children)

Is there any tool that can keep an updated model automatically, kinda like the open source steam cache? I’d love to self-host Wikipedia that syncs daily or weekly with changes.

[–] [email protected] 5 points 5 months ago* (last edited 5 months ago)

If you’re on nearly any flavor of UNIX (Linux, MacOS, etc.), you can run rsync on a scheduled basis via crontab.

Only limit is going to be, so far as I am aware, the download is provided as a compressed file, so you’ll have to download the whole thing, uncompress it locally and then do a selective update.

[–] thirteene 1 points 5 months ago

Open question: is there a "high quality" static version that people prefer to use, similar to the avocado prices data set? I have to imagine that anything pre-2023 without AI data is considered to be more accurate. Potentially a date before/after an impactful policy change.