this post was submitted on 25 Oct 2024
353 points (97.1% liked)
Curated Tumblr
4118 readers
3 users here now
For preserving the least toxic and most culturally relevant Tumblr heritage posts.
The best transcribed post each week will be pinned and receive a random bitmap of a trophy superimposed with the author's username and a personalized message. Here are some OCR tools to assist you in your endeavors.
-web
-iOS
Don't be mean. I promise to do my best to judge that fairly.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
As per my other post, this person isn't doing any of that.
But, since you asked for papers on generic matching algorithms, I found this during the silent conniption fit you sent me into after suggesting that some random tumblr user plugged a tumblr bot directly into a state of the art genomics db.
https://link.springer.com/article/10.1007/s11227-022-04673-3
Please note that while, yes, they ran this test on a standard office computer, they were only searching against 12 million characters.
A single tebibyte of characters would be more like 1 trillion characters. A pebibyte would be more like 1 ~~quintillion~~ quadrillion.
... much, much, much longer processing times.
Edit: Used the wrong word for stupendously large numbers that start with q.