this post was submitted on 08 Jul 2023
9 points (84.6% liked)

Programming.dev Meta

2365 readers
1 users here now

Welcome to the Programming.Dev meta community!

This is a community for discussing things about programming.dev itself. Things like announcements, site help posts, site questions, etc. are all welcome here.

Links

Credits

founded 1 year ago
MODERATORS
 

reddit and twitter (suposedly) jacked up their API prices because of data scrapers, what could lemmy to do try to stop them?

i dont think we can do anything

all 9 comments
sorted by: hot top controversial new old
[–] [email protected] 22 points 1 year ago (1 children)

Why would we? I posted on the internet with the knowledge that it would be used by others

[–] [email protected] 7 points 1 year ago

Yeah, I actually think lemmy could benefit from improved scraping and indexing. For example, it'd really help if more search engines could natively understand Lemmy's federated nature, e.g:

  • Deduplicate links by prioritising results for instances hosting the community that a post was originally submitted to.
  • Include and denote cross posts by recognizing order of submission timestamp and prioritizing popularity via vote ratios, comment counts, and lurker click-through traffic.
  • Do the same deduplication and prioritization across instances, but for comments as well.

Another use case besides search engines would be for internet archive projects, helping to preserve historic internet content even in the face of lemmy instances falling offline and disappearing. For example, much knowledge was lost to us due to the Twitter APIoplicips and Reddit Blackout: E.g:

Most of the above will only ever be possible due to improved scraping or even federation APIs.

[–] [email protected] 11 points 1 year ago (1 children)

Well, they did it because they're for profit companies looking to wring money out of people's interactions. Lemmy is open source and, by nature, publicly and readily available to whatever observers want to federate. It would be dead simple to create an activitypub server that does nothing but listen and save a copy of everything everyone ever said or posted (at least, those you get it to federate with).

[–] [email protected] 3 points 1 year ago (1 children)

Yeah absolutely, I'm actually tempted to implement some kind of encompassing "all" instance that listens to all communities of all known lemmy instances (to emulate something like reddits r/all equivalent for lemmy), which I think would be pretty useful, to get an idea of the "size" of the lemmy community. Probably without any users itself (also to avoid being blacklisted from other instances).

[–] [email protected] 2 points 1 year ago (1 children)

I wonder how stuff like this would affect the scalability of the fediverse, I think having 1 instance like you describe would be good, but having many of them could become an issue maybe

[–] [email protected] 3 points 1 year ago

Yeah but this is a general problem that ActivityPub has, it builds on the goodwill of the people/instances.

But I don't think it will be a really big issue, if there are really just a few of these instances (that is the premise though) But it's a good point, when something like this is open source, and anyone could spin it up to kinda DDOS the fediverse...

But when looking at it in more positive light/good will, it could probably also be used as an intermediate cache/index instance for other instances, which may save requests to the original instances.

[–] [email protected] 1 points 1 year ago

I think this is the beauty of federation. Everything is open and free to all rather than a company being able to lock in your personally created content.

For example, I wanted to learn about NLP and am working on building a bot to monitor sentiment and check for hate speech in lemmy content. I am still at the brainstorming/research phase, but the accessibility of lemmy makes it really nice.

Pythorhead was made for this exact purpose.

[–] MattMillz 1 points 1 year ago

Maybe the fediverse should operate under a General Public License like Linux does. So it would be okay for everyone to use and contribute but not steal and sell.