Compression ratios on plaintext are magical
Wikipedia
A place to share interesting articles from Wikipedia.
Rules:
- Only links to Wikipedia permitted
- Please stick to the format "Article Title (other descriptive text/editorialization)"
- Tick the NSFW box for submissions with inappropriate thumbnails
- On Casual Tuesdays, we allow submissions from wikis other than Wikipedia.
Recommended:
- If possible, when submitting please delete the "m." from "en.m.wikipedia.org". This will ensure people clicking from desktop will get the full Wikipedia website.
- Interested users can find add-ons and scripts which do this automatically.
Hi, unwanted internet pedant here. Gb is gigabits. GB is gigabytes. All of English Wikipedia compressed is about 100 GB.
What about G?
Gigaunitless
Is there any tool that can keep an updated model automatically, kinda like the open source steam cache? I’d love to self-host Wikipedia that syncs daily or weekly with changes.
If you’re on nearly any flavor of UNIX (Linux, MacOS, etc.), you can run rsync
on a scheduled basis via crontab.
Only limit is going to be, so far as I am aware, the download is provided as a compressed file, so you’ll have to download the whole thing, uncompress it locally and then do a selective update.
Open question: is there a "high quality" static version that people prefer to use, similar to the avocado prices data set? I have to imagine that anything pre-2023 without AI data is considered to be more accurate. Potentially a date before/after an impactful policy change.