this post was submitted on 11 Jan 2025
213 points (97.8% liked)

Data is Beautiful

1346 readers
1 users here now

Be respectful

founded 6 months ago
MODERATORS
 

Cross posted from: [email protected]

lingua latina pater linguarum dimidum est ๐Ÿ˜Ž

I hope it's okay for me to crosspost here.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Hackworth 5 points 3 days ago (2 children)

I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.

[โ€“] [email protected] 3 points 3 days ago* (last edited 3 days ago)

i know that my NN internally uses semantic tokenization method.

i literally often seek the word roots when talking to somebody. it helps me focus.

[โ€“] [email protected] 2 points 3 days ago

Interesting paper, thanks for sharing