this post was submitted on 30 Jun 2023

14 points (93.8% liked)

Selfhosted

41154 readers

679 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

Is there any product that can act as a knowledge base for topics (materials will include archived versions of websites, documents, media)? (self.selfhosted)

submitted 2 years ago by MigratingtoLemmy to c/selfhosted

10 comments fedilink hide all child comments

My apologies for the long title.

I'm looking for something to organise the materials I collect on various topics. As mentioned, these usually include archived webpages, documents/manuals, media (pictures, videos, audio) etc.

I know I could just create a directory system for this (and will likely do so if I can't find anything like what I describe), but just wanted to ask if anything like what I want exists.

Thanks!

top 10 comments

sorted by: hot top controversial new old

[–] vegetaaaaaaa 5 points 2 years ago (1 children)

Use files when possible - I use a combination of:

filesystem hierarchy (max 3-4 levels deep, I can share the general structure if needed)
markdown notes for wiki-like content or notes (either versioned in git - so also accessible from my Gitea instance, or under the Nextcloud Notes/ directory, so also accessible from the Nextcloud Notes app)
software mirrors either through a mirroring script or using Gitea's mirroring feature
Shaarli for bookmarks and wiki-like content, which get processed every day by a script that archives content to local files (mostly audio/video for now, I'm still writing the page archiving part, archivebox is too bloated for my needs and is missing critical features such as ad blocking)

All these components are linked in some way or another (e.g. all media automatically goes to the media directory of a jellyfin instance)

[–] MigratingtoLemmy 4 points 2 years ago (1 children)

Thanks, I do plan to use a organisation structure with plain directories. Please do share, it would help me greatly!

Indeed, I too plan to use markdown for any notes on what I'm looking at/is relevant to the object of interest.

Thank you for the note about archivebox, I didn't think about this before! Indeed, that would be very important: are you writing an alternative tool? Would you like to share the repo? I was considering just using wget to pull down pages but that might not work all the time.

Thanks a bunch!

[–] vegetaaaaaaa 2 points 2 years ago* (last edited 2 years ago)

organisation structure with plain directories

Slightly edited so you get the idea:

├── ARCHIVE
│   ├── DOCUMENTS
│   │   ├── 2018
│   │   ├── 2019
│   │   └── 2020
│   ├── WORK
│   │   ├── PROJECT1
│   │   └── PROJECT2
│   ├── DATA-MISC
│   ├── NOTES
│   ├── GAMES
│   ├── IMAGES
│   │   ├── MISC
│   │   ├── 2018
│   │   ├── 2019
│   │   └── 2020
│   ├── BOOKS
│   │   ├── IT
│   │   ├── DIY
│   │   └── NOVELS
│   ├── MUSIC
│   │   └── ARTIST - ALBUM
│   ├── SOFTWARE
│   │   ├── LINUX
│   │   ├── WINDOWS
│   │   └── ANDROID
│   └── VIDEO
│       ├── MOVIES
│       ├── MUSICVIDEOS
│       └── DOCUMENTARIES
├── DOWNLOADS
│   ├── DOCUMENTS
│   ├── WORK
│   ├── BOOKS
│   ├── GAMES
│   ├── MUSIC
│   ├── SOFTWARE
│   └── VIDEO
└── TMP

I use UPPERCASE for my base directory structure, so I know if a directory is uppercase it's probably part of the fixed structure. The key is to keep it max 2-3 levels deep.

Level 1:

ARCHIVE: stuff I want to keep, gets backed up
DOWNLOAD: stuff I did not have time to listen/look at/process yet. Not backed up (but I do backup a list of the files in this hierarchy).
TMP: stuff I use regularly but does not deserve to be archived/backed up (working copies of projects, random scripts/programs, VM disks...). Temporary, expendable.

Level 2: Broad topic or media type. Max 5-8.

Level 3: Finer-grained topic/media type. Only the ARCHIVE tree has this level of organization. There may be directories deeper than that but I don't actively manage them, they just... exist (extracted archives, etc.). One exception are subdirectories named NOBACKUP which are always excluded from automatic backups.

archivebox [...] are you writing an alternative tool? Would you like to share the repo? I was considering just using wget to pull down pages but that might not work all the time.

I am working on this tool which is a generic data manipulation/workflow tool. The shaarli workflow already works to grab bookmarks from Shaarli and download audio and video files. The webpage archiving module is still not written, it's the early design stage (issue), it will probably use wget in the backend, the alternative would be running a full headless browser and I don't want to get into that. This is my first medium-sized python project and I try to keep it clean, so it will take some time. Currently I'm more focused on other workflows/parts of the software.

As for file organization inside the directories, I try to maintain consistent/useful file naming including (depending on directory) date in YYYYMMDD format, author/parties involved, subject.

[–] [email protected] 3 points 2 years ago (1 children)

How will you interact with this system (search bar, file explorer and etc.)? How will you organize this system? What devices should be able to access this data?

There are lots of systems to store documents, information about objects, media and much more. But it has to be useful for use case.

For example: Navidrome is great for music as it allows you to streammusic from you mobile device or computer.

Zotero (can be self hosted using WebDAV) is great for searching through PDF/epubs and stored webpages. But it is a citation management system. It has to fit your use case.

If you want an allrouder than take a look at Nextcloud or Owncloud. They both can edit documents, play music and display pictures.

[–] MigratingtoLemmy 1 points 2 years ago* (last edited 2 years ago) (1 children)

Hi, let's take an example of me researching about HBAs.

I will archive the articles/blogs/pages I find relevant (I would also like for a tool to check the origin URL of these assets and for it to go an grab the latest version of them).
I will download documents/PDFs/Manuals on various HBAs and would like to store them (thanks for mentioning Zotero, I do want to search through PDFs sometimes and it's annoying. I hope it can automatically grab metadata for files).
I might download audio/video as supporting media/interesting aspects of what I'm looking for (say I find an interesting point in a podcast, I'll clip that part of the audio and store it here).
I'll write any notes I have in markdown and store them with the rest of the assets (in this case, it would list the quirks of the HBAs I have read about and which one would fit my plans).

I am not looking at Nextcloud since I find it too bloated for my purposes. I plan to host OpenSearch to search through the assets in my lab, however I would also like to maintain tight control over my storage, so I don't have to rely on an internal search engine to find something.

Thanks for reading, and I apologise if I sound a bit curt: Lemmy didn't register my response the first time and I had typed a really long response.

[–] czardestructo 2 points 2 years ago* (last edited 2 years ago)

Sounds like what you want is fairly close to Joplin. You can get a web clipper, attach files, type notes and organize it all arbitrarily.

[–] edtechdev 2 points 2 years ago

You might want an archiving tool. Some open self hostable ones include:

[–] [email protected] 1 points 2 years ago

obsidian.md

[–] Secret300 1 points 2 years ago

Zim is pretty cool. You can take notes and structure it like a wiki page and even export it as an html file or laTeX

[–] kfoo 1 points 2 years ago

I don't know if this is what you're looking for, but I run a small docker container with a Shaarli instance...the interface is very old school but it's perfect for links, notes, etc.

GitHub

Docs

Public Demo instance