this post was submitted on 14 Feb 2025
974 points (98.7% liked)
Technology
62979 readers
3856 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
are you going to use it to train your deepseek?
I never understood the desire to search in conversational language via AI. It's gone to far for my taste. I just want to be able to scour a huge volume of info for my exact search terms, maybe with a few synonyms or misspellings included. Google and AI keep trying to assume they know what I'm looking for, but they're always wrong (intentionally wrong based on their own motives).
The reason the dataset interests me is that search has gotten so bad that I can't get any non-corporate information from search engines anymore, just more pig swill, chumbucket ads, and misinformation slop. Anything I search for would probably give better results if I just searched old reddit, Wikipedia, and a few other datasets locally in a simple way. Not sure what software is best to use for something like that, but I'd like to collect a few mostly pre-AI datasets now to get the ball rolling before you can't find those online anymore either.
Not everyone is perverted like you.