this post was submitted on 29 Jun 2024
65 points (93.3% liked)
TechTakes
1528 readers
180 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The differing factor for me is profit, those Jetflicks guys aren't getting any sympathy from me. Charging or taking donations just to maintain infrastructure is one thing, but they had millions of dollars in profit.
You should be able to use anything you want for things, but once you start trying to make a profit off others works, that's where I tend to draw the line. For example, I have no qualms to pirate Photoshop to make memes or fix old family photos or whatever, but if I were to ever actually get good with it I would buy it or switch to an open source competitor before trying to make money with it.
AI training is where it starts to get murky, but having a base understanding of how training generally goes, I don't see an issue with it. The original image(s) fed into it is not really there anymore, it was processed into weights and math and then discarded in a way.
Yes, it's possible to recreate an original image from its training, but every example I've seen required a very very specific prompt to get it to do that and even then it was a bit off. It's akin to a person memorizing a piece of art and then re-creating it from memory, depending on the individual skill level it'll get pretty close but it'll be just a bit off
Now, training data taken from behind a paywall or private data like DMs or private posts can be a bit different and I'm generally against that.
So I've adopted this line: "If it's posted publicly, expect it to be used publicly" which includes just your average joe seeing it browsing around or a bot grabbing it for AI training
Very akin to an old Mark Hosler (of Negativland) interview where he said (not verbatim, cant find the old interview): "If you want to keep full control of your art, keep it in your home, maybe share it with family or a few close friends, but once that art is out in the wider world, you don't really have control over it anymore." Because, as you point out, an artist can recreate art that they have seen with their own eyes, but it will likely be a bit off from the original.
While I agree, let me play Devil's Advocate for a moment: books3 was "publicly posted" but was created from all the books on private torrent tracker Bibliotik. Would you agree that this would fall under private data since it's all pirated ebooks?
As for the Jetflicks guys, to me it's mostly twisted that they're up for more jail time than a lot of murderers get. Otherwise, I agree, the profits they made kind of kill any narrative that they were doing it for good reasons or that they deserve a massive amount of sympathy.
Not necessarily private data per se, but if it's being used to train a closed source model for profit (like openai using it for chatGPT) then I would consider it in the same realm as with the Jetflicks guys of using pirated works for profit. If it was just a couple people or researchers using it to train an open source model, then I see no issue with that especially since it's helping further advancement of technology for all over advancement of profits for one company.
I was unaware of that part, ok they'll get sympathy from me on that, that's absolutely disgusting.