LinkedinLunatics

3562 readers

5 users here now

A place to post ridiculous posts from linkedIn.com

(Full transparency.. a mod for this sub happens to work there.. but that doesn't influence his moderation or laughter at a lot of posts.)

founded 1 year ago

MODERATORS

[email protected]

598

PDFs (slrpnk.net)

submitted 1 month ago by [email protected] to c/[email protected]

193 comments fedilink hide all child comments

The whispering is all in her head and says she sucks

you are viewing a single comment's thread
view the rest of the comments

[–] AlpacaChariot 7 points 1 month ago (1 children)

Was it that the PDF produced by latex was less OCR friendly than the word one, or just that you didn't submit the PDF at all most of the time?

I guess if you trained a program to OCR PDFs that are produced by word it might get really good at that and less good at PDFs from other sources.

I'm curious if your CV font was computer modern?

[–] [email protected] 3 points 1 month ago

I think OCRs are really good nowadays but i think old ATS systems don't use them or at least use old OCR. If you parse a pdf (without OCR) a word exported pdf preserve the text order much better than a latex ones.

Like i actually tried some websites and python libraries to extract the text from my latex pdf, none of them gave good results like words inside pdf would be out of order.

If i use ocr then I get good coherent text. Which is really important for ATS but I doubt people use OCRs cuz they are kinda expensive or maybe people just use old ATS systems etc