No Stupid Questions

2174 readers

1 users here now

There is no such thing as a Stupid Question!

Don't be embarrassed of your curiosity; everyone has questions that they may feel uncomfortable asking certain people, so this place gives you a nice area not to be judged about asking it. Everyone here is willing to help.

ex. How do I change oil
ex. How to tie shoes
ex. Can you cry underwater?

Reminder that the rules for lemmy.ca still apply!

Thanks for reading all of this, even if you didn't read all of this, and your eye started somewhere else, have a watermelon slice 🍉.

founded 2 years ago

MODERATORS

[email protected]

Is there a way to digitally markup a pdf so its not OCR-readable? (self.nostupidquestions)

submitted 3 months ago by cheese_greater to c/[email protected]

13 comments fedilink hide all child comments

Want to ensure financial documents cant be parsed by automated systems

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 3 months ago

I would OCR it myself, but edit the meta data in the file so that the text in the OCR metadata is lorem ipsum.

So any bots that assume that the OCR text is what's on the image in the PDF (and why wouldn't they), it will only read useless junk. Only someone reading the text from the image would "see" it, and only a bot programmed to OCR a file that already has OCR metadata would realize that there's any inconsistency.

I'm not entirely sure how to accomplish that, but I'd figure it out if I was worried about the data being compromised.

Personally, I would simply keep the file in an encrypted container, then I wouldn't worry about what can scan the file since it would be entirely unreadable ciphertext without the correct security key or passphrase.