this post was submitted on 01 Jul 2023
4528 points (96.6% liked)

Mildly Infuriating

35479 readers
322 users here now

Home to all things "Mildly Infuriating" Not infuriating, not enraging. Mildly Infuriating. All posts should reflect that.

I want my day mildly ruined, not completely ruined. Please remember to refrain from reposting old content. If you post a post from reddit it is good practice to include a link and credit the OP. I'm not about stealing content!

It's just good to get something in this website for casual viewing whilst refreshing original content is added overtime.


Rules:

1. Be Respectful


Refrain from using harmful language pertaining to a protected characteristic: e.g. race, gender, sexuality, disability or religion.

Refrain from being argumentative when responding or commenting to posts/replies. Personal attacks are not welcome here.

...


2. No Illegal Content


Content that violates the law. Any post/comment found to be in breach of common law will be removed and given to the authorities if required.

That means: -No promoting violence/threats against any individuals

-No CSA content or Revenge Porn

-No sharing private/personal information (Doxxing)

...


3. No Spam


Posting the same post, no matter the intent is against the rules.

-If you have posted content, please refrain from re-posting said content within this community.

-Do not spam posts with intent to harass, annoy, bully, advertise, scam or harm this community.

-No posting Scams/Advertisements/Phishing Links/IP Grabbers

-No Bots, Bots will be banned from the community.

...


4. No Porn/ExplicitContent


-Do not post explicit content. Lemmy.World is not the instance for NSFW content.

-Do not post Gore or Shock Content.

...


5. No Enciting Harassment,Brigading, Doxxing or Witch Hunts


-Do not Brigade other Communities

-No calls to action against other communities/users within Lemmy or outside of Lemmy.

-No Witch Hunts against users/communities.

-No content that harasses members within or outside of the community.

...


6. NSFW should be behind NSFW tags.


-Content that is NSFW should be behind NSFW tags.

-Content that might be distressing should be kept behind NSFW tags.

...


7. Content should match the theme of this community.


-Content should be Mildly infuriating.

-At this time we permit content that is infuriating until an infuriating community is made available.

...


8. Reposting of Reddit content is permitted, try to credit the OC.


-Please consider crediting the OC when reposting content. A name of the user or a link to the original post is sufficient.

...

...


Also check out:

Partnered Communities:

1.Lemmy Review

2.Lemmy Be Wholesome

3.Lemmy Shitpost

4.No Stupid Questions

5.You Should Know

6.Credible Defense


Reach out to LillianVS for inclusion on the sidebar.

All communities included on the sidebar are to be made in compliance with the instance rules.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 8 points 1 year ago (1 children)

yeah, fediverse platforms not only have no measures against scraping, they willingly send out content in a computer-readable way. kind of the whole point of federation. and we can't really stop them, even if we clamp down on federation we'd only hurt ourselves.

besides, up until the latest change twitter was still easy to scrape (and now the problem is that even registered users can't see that much of it), and reddit is trivial to scrape even without the api. yes, that includes new reddit too. there's very little you can do against scraping in an open space, especially against someone wielding the full power of chatgpt, and even less so if you want to keep your site accessible to blind people.

[–] [email protected] 3 points 1 year ago (1 children)

(and now the problem is that even registered users can’t see that much of it)

People actually already found a way around the rate limit. Opera GX even implemented a fix in their desktop browser.

[–] [email protected] 3 points 1 year ago (1 children)

lmao, you know you fucked up when a browser pushes an update specifically to circumvent your rate limits

but yeah, if opera can do it, i highly doubt that openai can't easily do it either. the ai concerns are posturing (and probably a personal grudge, given that elon was a founding member of openai until he got kicked out), the real issue is somewhere between incompetence and attempted monetization.

[–] sauerkraus 2 points 1 year ago (1 children)

For Reddit API calls are near infinitely less load on the servers than scraping.

[–] [email protected] 1 points 1 year ago (1 children)

i'm actually kinda interested how that could work. a regular user using "near infinitely less" resources than a scraping engine sounds like some absolutely stupid design, either on reddit's or the scraping engine's side

[–] sauerkraus 2 points 1 year ago (1 children)

When using the API you just request what you’re looking for. With scraping you load everything repeatedly.

[–] [email protected] 1 points 1 year ago

except most of the weight of the site is in easily cachable assets that don't get reloaded at all. probably not even loaded to begin with, since even though new reddit is a single-page app, it does have seed data in the html content itself, which a well-written scraper (or one that automatically parses the site with chatgpt) can easily extract. constantly reloading styles and scripts would be a ridiculously stupid design on the scraper's part, and on reddit's if they necessitated it.

the html page itself is slightly heavier than just the json data but compared to all the images and videos real clients load and the giant piles of tracking data being sent back every second, a scraper is def going to be lighter. plus the site does reload itself every time you enter a new subreddit, that doesn't happen through the api for some reason.