this post was submitted on 18 Jan 2024
577 points (98.2% liked)

Technology

59769 readers
3022 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine::Researchers warn that most of the text we view online has been poorly translated into one or more languages—usually by a machine.

you are viewing a single comment's thread
view the rest of the comments
[–] BetaDoggo_ 9 points 10 months ago* (last edited 10 months ago) (1 children)

This isn't shocking at all. The markets for obscure language content are incredibly small so there's no incentive for most to spend resources on it. I'd argue mediocre machine translation is better than nothing at all in many cases, but for unsupervised training it does pose a challenge.

[–] xantoxis 10 points 10 months ago* (last edited 10 months ago)

They didn't only look at low-resource languages, they just started there because that was the problem domain. They found that 57% of ALL sentences on the Internet appeared to be machine translated, including translations into high-resource languages. The remaining 43% might also be machine generated, it just wasn't found to be part of a multi-way parallel group.