this post was submitted on 16 Dec 2023
96 points (96.2% liked)

Programming

17313 readers
12 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
 

So, in the era of increasingly good AI powered tools and general search engines full of SEO spam, last week I started creating something little old school and against the trends.

For now It's a have-fun-and-find-out project that main aim is to provide good search results for general web development queries with a special focus on independent blog authors.

The thesis is that no SEO spam website is in the index, which will already filter out most annoying noise on Google/Bing.

Search results are grouped per type: docs, blogs and magazines (e.g. blog platforms or bigger websites).

For now it's far from being done in terms of having a full index, but in most cases it already replaces my go-to search engine when I'm looking up some stuff during work.

I'm looking forward hearing out what y'all think and if you think it makes sense overall I can only encourage you to post some links to blogs or docs that are still missing in the index. I'm more than happy to add it to the crawler.

Responds like: "nei, total shit, who would need that" also accepted but constructive critique more appreciated ;)

EDIT: everyone many thanks for all your voices and comments. I'm super grateful for all of them and happy that we have such place like Lemmy!

you are viewing a single comment's thread
view the rest of the comments
[–] sznowicki 3 points 11 months ago (1 children)
  1. SO and Reddit are on the TODO list. It even had SO (in the bottom indeed) once but not via crawling, via SO Search API. It has very poor quality results and was super slow so I had to remove it while thinking of a better solution. Crawling entire SO might be little too much of this project at this state tho but if I have enough courage and hours at night I might parse that 20GB stack overflow archive dump and try doing something useful with it.

Same for Reddit but here I have mixed feelings about it in general and hope it's going to die soon being replaced by amazing Lemmy communities.

I also used to type some question and end with "reddit" in Google to get good quality content, but here with kukei the experiment is whether blogosphere can replace it properly when index is promoting it.

  1. Why blogs?

This is my main thing. To promote good quality blogs that I tried to follow via RSS but somehow never did. Having them all indexed (and more, some Mastodon community gave me amazing links to index) makes me actually visit them often.

For the "SEO cancer" that where curation comes into play. Before crawling I check unknown blogs to me and decide whether something goes in or not.

[–] DrakeRichards 3 points 11 months ago* (last edited 11 months ago)

That makes sense. I really like that the documentation is right at the top; many times all I want to do is find the right page in the official docs. You might want to look at how results are prioritized though: right now when I search for something simple like “how to center a div”, that result from Mozilla’s docs is included but it’s hidden as the second or third result. I would expect the page that’s explicitly about centering a div to be the top result, followed by the docs page for the element itself and maybe pages for flex or grid or something. That’s a really simple example, so maybe it’s not the target of this project, but I would still hope that simple topics are covered just as well as complex ones.

EDIT: I was a bit mistaken: “how to center a div” does bring up the Mozilla documentation for centering an element, but “center a div” brings up a page about accessibility as the top result.