this post was submitted on 17 Nov 2023
358 points (99.2% liked)

Selfhosted

39854 readers
1255 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Always enjoyed scrolling though these posts, figured I'd give it a go here:

What are your must-have selfhosted services?

Some of mine:

you are viewing a single comment's thread
view the rest of the comments
[–] pirat 1 points 11 months ago* (last edited 11 months ago) (1 children)

(with scan to SMB)

So the scanner saves the file in SMB-share(s), then Paperless(-xng) will automatically process it?

Maybe Paperless, with an LLM API integration to chat with the documents, using the power of referring to and verifying against Paperless' concrete results, would be somehow useful.

Edit: Oh, this is already being discussed on their GitHub. Of course it is!

[–] [email protected] 2 points 11 months ago (1 children)

You are right with the first part. It only takes three clicks to scan a doc and have it available.

As for me, I'm not interest in sending my documents to open AI. But it would definitely offer some nice functions.

[–] pirat 2 points 11 months ago (1 children)

I'm not interest in sending my documents to open AI.

You wouldn't have to. There are plenty of well-performing open-source models that work with an API similar to the Open AI standard, with which you can simply substitute OpenAI models by using a different URL and API-key.

You can run these models in the cloud, either selfhosted or "as a service".

Or you can run them locally on high-end consumer-grade hardware, some even on smartphones, and the models are only getting smaller and more performant with very frequent advancements regarding training, tuning and prompting. Some of these open-source models are already claiming to be outperforming GPT-4 in some regards, so this solution seems viable too.

Hell, you can even build and automate your own specialized agents in collaborating "crews" using frameworks, and so much more...

Though, I'm unsure if the LLM functionality should be integrated into Paperless, or rather implemented by calling the Paperless API from the LLM agent. I see how both ways could fit some specific uses.

[–] [email protected] 2 points 11 months ago (1 children)

Some features like a "tl,dr" bot would probably not even need high end hardware, because it does not matter if it takes ten minutes for a summary.

Features like a chat bot do not belong into paperless IMO.

[–] pirat 1 points 11 months ago (1 children)

a "tl,dr" bot would probably not even need high end hardware, because it does not matter if it takes ten minutes for a summary.

True, that's a good take. Tl;dr for the masses! Do you think an internal or external tl;dr bot would be embraced by the Paperless community?

It could either process the (entire or selected) collection, adding the new tl;dr entries to the files "behind the scenes", just based on some general settings/prompt to optimize for the desired output – or it could do the work on-demand on a per-document basis, either based on the general settings or custom settings, though this could be a flow-breaking bottleneck in situations where the hardware isn't powerful enough to keep up with you. However, that only seems like a temporary problem to me, since hardware, LLMs etc. will keep advancing and getting more powerful/efficient/cheap/noice.

a chat bot do not belong into paperless

Right – but, opposingly to that, Paperless definitely do belong into some chatbots!

[–] [email protected] 2 points 11 months ago (1 children)

I think more "intelligence" in parsing the documents would be well-received. Just as OCR is fundamental to paperless, AI features could be the next step forward. Automatically extract the relevant positions of e.g. a bill, understand the document (and select the correct date, not my birthday) and apply correct tags to new documents.

Paperless definitely do belong into some chatbots!

Definitely!

[–] pirat 2 points 11 months ago

Yes, I think that's the way to go. If the paperless-ngx team doesn't believe in following that path, someone else will probably fork the project and do it, or build something with similar capabilities "from scratch". Then, it'll be interesting to see what's coming forth of open-source models with capabilites similar to GPT-4Vision.... . . . . 🤯