this post was submitted on 23 Apr 2024
549 points (97.1% liked)
Technology
59708 readers
5428 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You say that because it's clear you have no fucking clue how difficult a problem this is. This isn't something you can do overnight, and I'm not even sure a self-hosted solution is possible.
You say that, but it's clear you have no fucking clue how easy a solution is.
https://yacy.net/
Commercial options:
https://solr.apache.org/
https://www.meilisearch.com/
No, you just haven't thought through the implications more than a single step.
The real trick is SEO. These systems will be gamed. Google used to handle this by using its monopoly on search to enforce rules. It wasn't perfect, but it kept the worst spam from being in the top five results for the most part. Doing this self-hosted would mean a million users having to agree to do the same thing to punish spam results, and that does not work.
And then there's the problem of crawling and storing the entire web. Doing this for specific topics is doable. The entire web is not. Not for a home user with limited budget. YaCy's P2P mode might be a way around that, but it's also not really "self-hosted" anymore.
Microsoft dumped tons of money into making the second best search engine, and it's a bit of a joke. This is not an easy problem.