this post was submitted on 18 Dec 2023
32 points (97.1% liked)

Selfhosted

40346 readers
419 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I'm looking for a service I could install to archive a huge pile of letters, preferably in PDF form, to a database. I'm living in a country where paper is still king, and digital services are either non-existent, or loathed (Germany). My current situation is that I have a mailbox with lots of PDFs all over the place, but also many folders of paper sent in 2007 etc. that I have to keep, but I also have to find them every five years or so.

So what I'd like to have is a service to my homelab, where I could scan these and copy these, that would index them, clean them, OCR them and all that good stuff. It should have really good metadata abilities, because my files are usually named in a very random way, so if I could copy these, and quickly categorize them, that would be really awesome.

There is one service called Papermerge, that kind of fits to my use-case. I spent one afternoon with it, and there were a few issues:

  • crashes quite often
  • when sending a large folder of PDFs, uses all the CPU and crashes again
  • categorizing functions are not very good, it takes time to get everything together and clean when organizing files

This might not be very interesting if your country has digital services for everything, but for us needing to suffer this paper madness, a service to do so would be great.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 11 months ago (1 children)

Worst case I have all my OCRed documents as raw files which I can migrate to whereever.

Files still exist. For my case encrypted as well. My backups roll on the data, not the container.

But I'm not trying to convince you, I tried answering the questions :)

And two answer your last question clearly: I survived before paperless, I'd get along without it. I find a new tool to mitigate my manual labor as good as possible - if that's not possible then jo harm done. I know I'm flexible, I can learn new tools and I'm never vendor or tool locked-in. I have a high level of self confidence when it comes to my tool chain and how I'd adapt any part of it - from password manager to cloud storage and my mail flow.

To be honest I couldn't self host anything if I'd had the fear of being lost if a tool is discontinued.

[–] TCB13 2 points 11 months ago (2 children)

But I’m not trying to convince you, I tried answering the questions :)

I was just trying to see how you're thinking about the possible lock-in and dependency on those platforms... also exposing my real concerns with them.

To be honest I couldn’t self host anything if I’d had the fear of being lost if a tool is discontinued.

Yeah but most thing we self host are more "fungible" be it a torrent client, RSS aggregator etc. can be quickly replaced by another alternative as they hold little to no data and even sometimes the data they hold doesn't even have any value. A document management solution however is a long term thing that holds important documents.

[–] [email protected] 1 points 11 months ago

Ahh g I don't use paperless as an exclusive document storage but as a pure manager. It searches and tags but doesn't have exclusivity over any files but it's own indices!

It doesn't provide more value than jellyfin in that regard - make it visible and accessible.

[–] [email protected] 1 points 11 months ago

My way of using paperless-ngx includes an automatic export to plain pdf-files which are synced via syncthing.

Everything is accessible with a normal filesystem and over the keepass-gui..