this post was submitted on 29 Jun 2023
176 points (98.9% liked)
13435 readers
1 users here now
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I still find astonishing that tech crunch buys the argument of ML model training.
No one in their sane mind would use the API (that have always been rate limited) for fetch data for text generation. People would use HTTP or, even better, archives of reddit.
Why? Because there is better or no rate limit, there is no need to write anything (only reading) and it will stay free ๐ Also super fresh data is not dramatically useful (except in very specific corner cases when something in the news change the way we talk)
Web crawling has always worked through raw HTTP/HTML parsing, why create site specific API calls that require authentication and are throttled.
This excuse is pure bullshit.
Considering the Reddit API has a hilariously low limit, I fully understand why the AI bro's will use a scraping approach instead. I've built small discord bots that had a difficult time following the API because you had so little Requests available! I was in the process of building an event-driven system which used multiple API tokens in order to be able to keep up with multiple feeds. Its just terrible.
Another proof of Reddit's incompetence.