this post was submitted on 11 Feb 2025
236 points (98.8% liked)
Technology
62130 readers
7101 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ah, fuck, is that what the case is about? That sucks; that's the kind of case where they both need to lose:
If I were more conspiracy-minded, I would almost think that somebody intentionally decided to resolve this case first in order to guarantee that they set a disastrous precedent.
It's not what this case about. Reuters runs a service called Westlaw that provides access to a bunch of legal materials, including summaries and explanations of cases that are written by its lawyers. Ross Intelligence wanted access to those summaries, so that it could train AI to make a competing product. As you can imagine, Reuters said no to this.
So, Ross bought summaries from someone else, another company that did have access to Westlaw, and used those to train its AI. Today, the court found (among other things), that a few thousand of the summaries that Ross's AI produced are way too similar to Westlaw's summaries for it to be a coincidence. Ross had argued (among other things) that its summaries were only similar because they were describing the law, and Reuters doesn't/can't have a copyright on the law. The court rejected this argument, saying, essentially "Yeah, it's true that Reuters doesn't have a copyright on the law, but it does have a copyright on the summaries that its lawyers write. It takes skill and judgment to decide which parts of a law or decision are important for people doing legal research, and to present them in a way that's easy for people to understand. You clearly copied many of them."
This isn't an exhaustive discussion of all the issues covered in the opinion, because I'm a sleepy lawyer, but it's the most important part.
Which is funny cuz this is exactly what the cops do to prosecute citizens. they buy 3rd party data they're not legally entitled to gather themselves.
Interesting to see this possibly be used against prosecutions in the future where the cops collected 3rd party data.
This is probably just inevitable when your dataset is not large enough. I would be interested in seeing the LLM's output compared against the original texts; I do remember the early ChatGPT producing some borderline copies of sentences that you could find online (with one or two words changed).
I'm not a lawyer, but I'm also not entirely unfamiliar with this sort of thing. In particular, I remember Georgia v. Public.Resource.Org and thus do not accept at face value the notion that the data in question being "summaries and explanations of cases" necessarily means Westlaw is in the right. Even if the Westlaw materials aren't "officially" incorporated into the law itself the way Georgia did, that doesn't mean Westlaw should necessarily be entitled to monopolize them, especially if the judicial system is heavily leaning upon them to inform its decisions.
https://natlawreview.com/article/court-training-ai-model-based-copyrighted-data-not-fair-use-matter-law
It sounds like the case you mentioned had a government entity doing the annotation, which makes it public even though it's not literally the law.
Reuters seems to have argued that while the law and cases are public, their tagging, summarization and keyword highlighting is editorial.
The judge agreed and highlighted that since westlaw isn't required to view the documents that everyone is entitled to see, training using their copy, including the headers, isn't justified.
It's much like how a set of stories being in the public domain means you can copy each of them, but my collection of those stories has curation that makes it so you can't copy my collection as a whole, assuming my work curating the collection was in some way creative and not just "alphabetical order".
Another major point of the ruling seems to rely on the company aiming to directly compete with Reuters, which undermines the fair use argument.
I don't trust that judge's ability to determine whether they were copied if it wasn't verbatim. which is what copyright is. to control an idea, you need a patent.