this post was submitted on 28 Sep 2024
598 points (97.6% liked)
LinkedinLunatics
3591 readers
6 users here now
A place to post ridiculous posts from linkedIn.com
(Full transparency.. a mod for this sub happens to work there.. but that doesn't influence his moderation or laughter at a lot of posts.)
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Actually this is good advice. Nowadays nobody reads your CV in the first step. Your CV first gets through an automated system (ATS i think its called). It's designed to filter out as much as possible.
The problem with PDF is that it's terrible to parse cuz it's designed for humans reading it, not machines. The only reliable way to parse it is by converting it to images and then OCR, which is kinda expensive.
So before you send a PDF, you should first try to convert it to txt and see if the content make enough sense. Or just use word to make a CV then export to PDF.
When i was looking for a job, i remember there was a website that would give you tips on your CV and they had an ATS report of your CV. I was so shocked to realize that ATS totally messed up completely to parse the correct info from my latex CV. Like I have a lot of AI/ML experience and it completely missed it and thought i had quality assurance one. And i was applying for AI jobs, no wonder I couldn't get any interviews. Then I changed it to word and an exported pdf where word wasn't accepted. I got many more interviews after that.
Was it that the PDF produced by latex was less OCR friendly than the word one, or just that you didn't submit the PDF at all most of the time?
I guess if you trained a program to OCR PDFs that are produced by word it might get really good at that and less good at PDFs from other sources.
I'm curious if your CV font was computer modern?
I think OCRs are really good nowadays but i think old ATS systems don't use them or at least use old OCR. If you parse a pdf (without OCR) a word exported pdf preserve the text order much better than a latex ones.
Like i actually tried some websites and python libraries to extract the text from my latex pdf, none of them gave good results like words inside pdf would be out of order.
If i use ocr then I get good coherent text. Which is really important for ATS but I doubt people use OCRs cuz they are kinda expensive or maybe people just use old ATS systems etc