this post was submitted on 28 Jun 2023
151 points (98.7% liked)

Technology

59223 readers
2752 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

After two weeks of demonstrations, some media buyers anticipate Reddit turning to alternative revenue streams.

you are viewing a single comment's thread
view the rest of the comments
[–] QHC 8 points 1 year ago (1 children)

AI scrapers never used the API. That's just a convenient scapegoat.

Don't believe Huffman's lies!

[–] Candelestine 4 points 1 year ago (2 children)

I would love to learn more. Can you share any links? My googlefu is insufficient to cut through the bs without knowing more about the terms I need to be searching for.

[–] QHC 9 points 1 year ago

There's just no compelling reason to do it that way.

LLMs like ChatGPT are getting data from the entire web and then having humans manually tag and identify everything. Getting data from the API is actually less useful to that end, and they'd need to integrate separately with every individual website.

Most websites don't even have an API in the first place, either, so scraping would still be necessary for most sites.

[–] cubism_pitta 7 points 1 year ago

It's just the way a large data gathering project works. If you want to get data from 100s of sites a scraper is universal and can work for all while using an API would require (assuming all the sites HAVE an API) custom code for each.