this post was submitted on 26 Mar 2024

282 points (95.8% liked)

News

26945 readers

6325 users here now

Welcome to the News community!

Rules:

1. Be civil

Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.

2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.

Obvious right or left wing sources will be removed at the mods discretion. Supporting links can be added in comments or posted seperately but not to the post body.

3. No bots, spam or self-promotion.

Only approved bots, which follow the guidelines for bots set by the instance, are allowed.

4. Post titles should be the same as the article used as source.

Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.

5. Only recent news is allowed.

Posts must be news from the most recent 30 days.

6. All posts must be news articles.

No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.

7. No duplicate posts.

If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.

8. Misinformation is prohibited.

Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.

9. No link shorteners.

The auto mod will contact you if a link shortener is detected, please delete your post if they are right.

10. Don't copy entire article in your post body

For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

founded 2 years ago

MODERATORS

282

Google will no longer back up the Internet: Cached webpages are dead (arstechnica.com)

submitted 11 months ago by BonesOfTheMoon to c/news

40 comments fedilink hide all child comments

all 42 comments

sorted by: hot top controversial new old

[–] AbouBenAdhem 186 points 11 months ago (1 children)

Google will no longer allow public access to its caches. I doubt they’ve stopped keeping caches for their own use.

[–] EmpathicVagrant 73 points 11 months ago (1 children)

Yeah now it’s just for feeding the shitty LLM every software company feels the need to shoehorn into whatever they possibly can.

[–] AbouBenAdhem 49 points 11 months ago (2 children)

Now that junk AI content has polluted the public web, access to pre-LLM content has become far more valuable—that’s why Reddit shut down their public APIs too.

[–] [email protected] 14 points 11 months ago

This is a very reasonable deduction.

[–] nrezcm 4 points 11 months ago

It's like pre atomic steel!

[–] [email protected] 98 points 11 months ago (2 children)

I'd recommend everyone to archive their pages through the Internet Archive instead, as that non-profit seems to be more concerned with ethics than corporations such as Google.

[–] [email protected] 51 points 11 months ago

Also, https://archive.org/donate

[–] [email protected] 8 points 11 months ago (1 children)

True, but two copies is one, and one is none. Multiple backups are critical, especially as archive.org has been targeted by the last few book publishers, who want it gone. As politicians and news sites quietly modify their content and hope nobody notices, this should really be a service of the Library of Congress, too.

[–] [email protected] 3 points 11 months ago

Yes but no copies is an error. So better to support the ones we have left.

[–] WormFood 67 points 11 months ago (1 children)

i used to look at cached pages all the time. it was particularly useful if the current version of the page was different to Google's cached version, or if the page was down. then the button to open the cached page disappeared without explanation. one of the many ways that Google search now is worse than it was 20 years ago

[–] [email protected] 14 points 11 months ago (2 children)

Also helpful for pages that try to hide parts they let the search engine index behind a paywall when it’s a human visitor. Like the notorious expertsexchange before it got usurped by stackoverflow.

[–] [email protected] 12 points 11 months ago (1 children)

Expert sex change?

[–] [email protected] 3 points 11 months ago

it would have been a much better website if that were the case.

[–] qevlarr 3 points 11 months ago

Or getting around geoblocking. Very annoying. It's also unnecessary to shut down everything because some countries have reasonable privacy protection. Don't snoop on your visitors and you have nothing to worry about. I'm sorry the US policy is written by and for marketing tech companies

[–] FlyingSquid 61 points 11 months ago (2 children)

The really bad thing is that it's only a matter of time before the Internet Archive is sued into oblivion. People are uploading full copyrighted movies and there's no moderation at all. It's not just cacheing that is at risk here either.

Amongst other things, the Internet Archive is the home of the Prelinger Archives, the largest collection of educational, industrial and other ephemeral films from the silent era on. If the IA goes down, the only place to access those would be commercial outlets like YouTube.

And it will be a real shame.

[–] [email protected] 14 points 11 months ago (1 children)

its a shame its so large that no one has the money to clone it to their own servers (that i know of)

[–] [email protected] -3 points 11 months ago (1 children)

I think getting permission from Google would be harder than being enough hardware.

[–] [email protected] 7 points 11 months ago (1 children)

Why would you need permission from Google to mirror the Internet Archive? Google doesn't own IA.

[–] [email protected] 4 points 11 months ago

I misunderstood, I thought the person I was replying to was talking about Googles cache, not IA.

[–] [email protected] 4 points 11 months ago

cacheing

It's "caching". Spell-check was right again.

[–] [email protected] 44 points 11 months ago (1 children)

Google: Getting progressively shitty, one decision at a time.

[–] [email protected] 16 points 11 months ago (1 children)

It's like the saw IBM's rise and fall and said "I want that!"

[–] [email protected] 15 points 11 months ago (2 children)

IBM didn't "fall". Just because they don't make phones and pop culture shit anymore doesn't mean that International Business Machines are going anywhere soon.

How many companies do you know that make a multitude of processor architectures and a multitude of super computer architectures?

IBM probably has a black site nuclear bomb research lab aimed straight at their competition too

[–] [email protected] 4 points 11 months ago (1 children)

I actually work in the field and all of our architectures are made in-house. I honestly don't know what influence IBM has on any industry other than making annoying commercials in my podcasts and talking about the cloud.

[–] [email protected] 1 points 11 months ago (1 children)

They still make a shitload of mainframes, some part because of backwards compatibility and some parts because ain't nobody got fired for buying IBM, y'know y'know

[–] [email protected] -1 points 11 months ago

Read some Cringely. IBM is at a tipping point and needs to stop processing acquisitions while it has some momentum left.

[–] [email protected] 35 points 11 months ago

Google never did make backups of the Internet, why are we pretending like they ever did? Cached webpages were a basic workaround for third-party website downtime; a guarantee that you could reliably see the information you searched for, even if the linked site was down. It was nothing more than a snapshot of the webpage their crawlers saw, where older copies are permanently deleted with every new crawl of the page.

It was never an archival effort, it was a rotating cache. If you were under the impression for all these years that Google was preserving Internet history, I don't know why, because Google never claimed to be doing that. Maybe it's time to reevaluate any other altruistic things you're assuming that mega corporations are up to...

[–] [email protected] 34 points 11 months ago (2 children)

2/2/2024

That article is from February.

[–] SonnyVabitch 17 points 11 months ago (2 children)

Is that the UK or US date format?

[–] [email protected] 7 points 11 months ago

Yes.

[–] Viking_Hippie 2 points 11 months ago

The Slovenian one.

[–] [email protected] 2 points 11 months ago

But is it cached?

[–] [email protected] 23 points 11 months ago (1 children)

It's a good thing we have archive.org then.

[–] [email protected] 19 points 11 months ago (1 children)

For now...

https://archive.org/donate

[–] [email protected] 5 points 11 months ago

Already got a monthly set up!

[–] mojo_raisin 18 points 11 months ago

That's fine, I don't trust Google anyways. Any real backing up or archiving of internet culture can't be done for profit or it will be shut down when not profitable or enshitified.

[–] unreasonabro 9 points 11 months ago

google stopped "backing up" the internet when it sold it out and ruined it.

[–] [email protected] 1 points 11 months ago (2 children)

I never knew google was doing this

[–] motor_spirit 11 points 11 months ago

There used to be a little link to the side of all(?) or most(?) search results that would allow you to view a cached ver of the result. If you have used Google historically you have probably overlooked it many times. I can't recall them ever loading for me..

[–] elbarto777 1 points 11 months ago

It was great, especially for pages with a specific answer. "How do I cook cordon bleu? Here's a result!" 503 Temporarily unavailable? Fuck, but I need it now!!" Clicks Cached button ... voilà.