All the posts about Reddit blocking everyone except Google and Brave got me thinking: What if SearNGX was federated? I.E. when data is retrieved via a providers API, that data is then federated to all other instances.

It would spread the API load out amongst instances, removing the API bottlenecks that come from search providers.

It would allow for more anonymous search, since users could cycle between instances and get the same results.

Geographic bias would be a thing of the past.

Other than ActivityPub overhead and storage, which could be reduced by federating text-only content, I fail to see any downside.

Thoughts?

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 61 points 2 months ago (14 children)

I think you are not a computer programmer. Trying to build an index of the web by querying other search engines is not an efficient or sensible way to do things. Using ActivityPub for it is insane. Sharing query results in the obvious way might help a little during events where everyone searches for the same thing all at once, but in a relatively small pool of relatively sophisticated Internet users I don't think that happens often enough to justify the enormous amount of work and complexity.

On the other hand a distributed web crawler that puts its results in a free and decentralized database (one appropriate to the task; not blockchain) might be interesting. If the load on each node could be made light enough and the software simple enough that millions of people could run it at home, maybe it could be one way to build a new search engine. If that needs doing and someone has several hundred hours of free time to get it started.

[–] [email protected] 27 points 2 months ago* (last edited 2 months ago) (9 children)

If you're looking for a distributed crawler and index:

https://en.wikipedia.org/wiki/YaCy

Yacy already exists and has been around for 2 decades.

[–] [email protected] 2 points 2 months ago (5 children)

I really want to use this, but from what I read it basically requires a minimum of 20-30GB of RAM to be performant. Also the documentation appears to be a mess and highly outdated. I'd also want to cluster it internally and connect with outside peers still which seems possible, but with the large resource requirement not as feasible with my setup.

[–] Im_old 4 points 2 months ago

I've run it in containers, never used that many resources. The whole server (running a few dozen containers) was 32gb, and it wasn't impacted in any sensible way.

load more comments (4 replies)

load more comments (7 replies)

load more comments (11 replies)