this post was submitted on 26 Jul 2024
123 points (96.9% liked)

196

16745 readers
3089 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 2 years ago
MODERATORS
 
top 13 comments
sorted by: hot top controversial new old
[–] Deestan 16 points 5 months ago (1 children)
[–] [email protected] 3 points 5 months ago (2 children)

did you

did you just romanize a japanese word using æ

im so confused

[–] Deestan 16 points 5 months ago (1 children)

"Hæ" is "huh" in Norwegian

[–] [email protected] 9 points 5 months ago

whoops lmao

[–] [email protected] 5 points 5 months ago (1 children)

Haiii, Ohioo! Skibiddi desuka

[–] [email protected] 2 points 5 months ago (1 children)

Haii! Is just Hello + uwu.

Though a varient is Ohaii! Which is catspeak from the Aughts and unrelated to alpha-slang Ohio or the place where Kent State happened.

[–] [email protected] 1 points 5 months ago

that is very interesting and cool, but I was relang them cause my brain made the connection and I thpught it was funny.

[–] [email protected] 7 points 5 months ago (2 children)

this has been a thing for like five years, on sites as big as e.g. Youtube. do they not see it as a bug, somehow? or is it just way harder to fix than one would think for complicated computery reasons?

[–] [email protected] 3 points 5 months ago (1 children)

They probably don't perform the translation until the user requests it. Automatically translating every comment to every language to check if it changes would be a lot of additional computation

[–] [email protected] 1 points 5 months ago (1 children)

It might not be too bad, once you get into code breaking, some of the simple techniques quickly yield metrics that can guess at the language with not much processing (depending on the total message length, but you could get a similar low effort guess by just analysing a sample)

It's as simple as measuring the average distance between letters in a sample, and you could probably do more by using something like average ranges in UTC. Each language will vary, so you can build a map with some sample text, them just take n letters to guess the language with reasonable accuracy

On top of that, you could use user feedback or other factors to further narrow it down...Not perfect (and would look strange like this when it does fail), but then you could flag a defected language and give users a one click translate button

They probably don't do the translations until requested like you said (there's a lot of languages out there to translate into after all), but a platform as big as YouTube might be using big data to decide what to preemptively translate into what language (and maybe using low demand periods or optimizing for engagement, maybe a combination of both)

[–] [email protected] 2 points 5 months ago (1 children)

I mean they could. But do you think that if something offers a translate button and it translates to the same thing, that that's costing them enough money that it's worth it for them to spend all that effort?

[–] [email protected] 2 points 5 months ago

No way. It all comes down to the most expensive piece of most software - if they write a translation feature, and it works 98% of the time, that's a complete success. That last 2% will probably take way longer to whittle down than the feature took to deploy

Even if the percentage was lower (and honestly I think it's even higher from my own use), to even figure out if it's worth it you'd have to put man hours on breaking down the numbers, estimating alternatives, and then actually doing the work

In this case, I don't think​ it's actually feasible - translation isn't that resource intensive. If you've already done the cheap language detection so you don't run it on everything and are using a reasonably efficient translation method, the last few percentage points of accuracy would probably take more resources than the occasional pointless work

[–] [email protected] 1 points 4 months ago

why would they care about a bug that doesn't prevent people from watching ads?