this post was submitted on 28 Sep 2024
508 points (97.6% liked)

LinkedinLunatics

3298 readers
924 users here now

A place to post ridiculous posts from linkedIn.com

(Full transparency.. a mod for this sub happens to work there.. but that doesn't influence his moderation or laughter at a lot of posts.)

founded 1 year ago
MODERATORS
 

The whispering is all in her head and says she sucks

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 4 hours ago (1 children)

You can extract text from PDFs without using OCR, they aren't all images embedded in a file.

I'm sure you've opened PDF documents before and selected text in it, or searched for something. That works because the text is embedded in the document, I'm sure.

You can also create PDF documents with the text converted as images, but those are usually larger in size.

[–] [email protected] 2 points 4 hours ago (1 children)

Not necessarily, CVs have complicated formatting. Nobody (should) write blocks of text, and you don't know how many columns the candidate is using. Is the candidate using a specific section to show star based skill rating or word based? So you can still search for individual keywords but if you try copying the whole pdf and paste it in txt (which is what will be forwarded to ATS), it does not make much sense. The structure is too complicated extract where you studied, what did you studied and your grade, what other experiences you have and how long you worked there etc.

Extracting structured data is in its own right a different field of science. There is plenty of recent research on extracting structured data from academic pdfs (I was working on this in a research institute in germany around 2022), even when LLMs are used it can get really complicated to the point that there are specialized LLMs for just that.

But ATS systems are cheap/not high enough priority to even use OCR let alone LLMs so unfortunately the responsibility of making an easily parsable CV comes down to the candidate.

Try this next time you see your CV, copy its text to a txt then think about if you can write a program that can reliably extract your experience, education, interests etc. Its going to be super difficult and even then it won't generalize to thousands of other CVs.

[–] [email protected] 1 points 30 minutes ago

All those "problems" apply to Word too. Maybe you use tables, maybe you use lists, maybe you use stars, maybe ... So there's no advantage in forcing people to use Word "because the machine can understand it better". Because that's a lie.