this post was submitted on 05 Nov 2023

385 points (99.0% liked)

Technology

62094 readers

5624 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

385

Smokey's Simple Guide To Search Engine Alternatives (self.technology)

submitted 1 year ago* (last edited 1 year ago) by Smokeydope to c/technology

28 comments fedilink hide all child comments

Smokey's Simple Guide To Search Engine Alternatives

This post was inspired by the surge in people mentioning the new Kagi Search engine on various Lemmy comments. I happen to be somewhat knowledgeable on the topic and wanted to tell everyone about some other alternative search engines available to them, as well as the difference between meta-search engines and true search engines. This guide was written with the average person in mind, I have done my best to avoid technical jargon and speak plainly in a way most should be able to understand without a background in IT.

Understanding Search Engines Vs. Meta-Search Engines

There are many alternative search engines floating around that people use, however most of them are meta search engines. Meaning that they are a kind of search result reseller, middle men to true search engines. They query the big engines for you and aggregate their results.

Examples of Meta-search engines:

Format: Meta Search Engine / Sourced True Engines (and a hyperlink to where I found that info)

Duckduckgo / Bing has some web crawling of it own but mostly relies on Bing

Ecosia / Bing + Google a portion of profit goes to tree planting

Kagi / Google, Mojeek, Yandex, Marginalia, Requires email signup, 10$/month for unlimited searches

SearXNG / Too many to list, basically all of them, configurable, Free & Open Source Software AGPL-3.0

Startpage / Google + Bing

4get / Google, Bing, Yandex, Mojeek, Marginalia, Wiby Open source software made by one person as an alternative to SearX

Swisscows / Bing

Qwant / Bing Relied on Bing most of its life but in 2019 started making moves to build up its own web crawlers and infrastructure putting it in a unique transitioning phase.

True Search Engines & The Realities Of Web-Crawling

As you can see, the vast majority of alternative search engines rely on some combination of Google and Bing. The reason for this is that the technology which powers search engines, web-crawling and indexing, are extremely computationally heavy, non-trivial things.

Powering a search engine requires costly enterprise computers. The more popular the service (as in the more people connecting to and using it per second) the more internet bandwidth and processing power is needed. It takes a lot of money to pay for power, maintenance, and development/security. At the scales of google and Bing who serve many millions of visitors each second, huge warehouses full of specialized computers known as data centers are needed.

This is a big financial ask for most companies interested in making a profit out of the gate, they determine its worth just paying Google and Bing for access to their enormous pre-existing infrastructure without the headaches of dealing with maintenance and security risk.

True Search engines

True search engines are honest search engines which are powered by their own internally owned and operated web-crawlers, indexers, and everything else that goes into making a search engine under the hood. They tend to be owned by big tech companies with the financial resources to afford huge arrays of computers to process and store all that information for millions of active users each second. The last two entries are unique exceptions we will discuss later.

Examples of True Search Engines:

Bing / Owned by Microsoft

Google / Owned by Google/Alphabet

Mojeek / Owned by Mojeek .LTD

Yandex / Owned by Yandex .INC

YaCy / Free & Open Source Software GPL-2.0, powered by peer to peer technology, created by Michael Christen,

Marginalia Search / Free & Open Source Software AGPL-3.0, developed by Marginalia/ Martin Rue

How Can Search Engines Be Free?

You may be wondering how any service can remain free if it needs to make a profit. Well, that is where altruistic computer hobbyist come in. The internet allows for knowledgeable tech savvy individuals to host their own public services on their own hardware capable of serving many thousands of visitors per second.

The financially well off hobbyist eats the very small hosting cost out of pocket. A thousand hobbyist running the same service all over the world allows the load to be distributed evenly and for people to choose the closest instances geographically for fastest connection speed. Users of these free public services are encouraged to donate directly to the individual operators if they can.

An important take away is that services don't need to make a profit if they aren't a product to a business. Sometimes people are happy to sacrifice a bit of their own resources for the betterment of thousands of others.

Companies that live and die by profit margins have to concern themselves with the choice of owning their own massive computer infrastructures or renting lots of access to someone elses. You and I just have to pay a few extra cents on an electric bill that month for a spare computer sitting in the basement running a public service + some time investment to get it all set up.

As Lemmy users, you should at least vaguely understand the power of a decentralized service spread out among many individually operated/maintained instances that can cooperate with each other. The benefit of spreading users across multiple instances helps prevent any one of them from exceeding the free/cheap allotment of API calls in the case of meta-search engines like SearXNG or being rate limited like 3rd party YouTube scrapers such as Invidious and Piped.

In the case of YaCy decentralization is also federated, all individual YaCy instances communicate with each other through peer-to-peer technology to act as one big collective web crawler and indexer.

SearXNG

I love SearXNG. I use it every day. So its the engine I want to impress on you the most. SearX/SearXNG is a free and open source, highly customizable, and self-hostable meta search engine. SearX instances act as a middle man, they query other search engines for you, stripping all their spyware ad crap and never having your connection touch their servers.

Here is a list of all public SearX instances, I personally prefer to use paulgo.io All SearX instances are configured different to index different engines. If one doesn't seem to give good results try a few others.

Did I mention it has bangs like DuckDuckGo? If you really need Google like for maps and business info just use !!g in the query.

Other Free As In Freedom Search Engines

Here is Marginalia Search a completely novel search engine written and hosted by one dude that aims to prioritize indexing lighter websites little to no JavaScript as these tend to be personal websites and homepages that have poor Search Engine Optimization (SEO) score which means the big search engines won't index them well. If you remember the internet of the early 2000s and want a nostalgia trip this ones for you. Its also open source and self-hostable.

Finally, YaCy is another completely novel search engine that uses peer-to-peer technology to power a big web-crawler which prioritizes indexes based off user queries and feedback. Everyone can download YaCy and devote a bit of their computing power to both run their own local instance and help out a collective search engine. Companies can also download YaCy and use it to index their private intranets.

They have a public instance available through a web portal. To be upfront, YaCy is not a great search engine for what most people usually want, which is quick and relevant information within the first few clicks. But, it is an interesting use of technology and what a true honest-to-god community-operated search engine looks like untainted by SEO scores or corporate money-making shenanigans.

Free As In Freedom, People vs Company Run Services

I personally trust some FOSS loving sysadmin that host social services for free out of altruism, who also accepts hosting donations, whos server is located on the other side of the planet, with my query info over Google/Alphabet any day. I have had several communications with Marginalia over several years now through the gemini protocol and small web, they are more than happy to talk over email. have a human conversation with your search engine provider thats just a knowledgeable every day Joe who genuinely believes in the project and freely dedicates their resources to it. Consider sending some cash their way to help with upkeep if you like the services they provide.

Self-Hosting For Maximum Privacy

Of course you have to trust the service provider with your information, and that their systems are secure and maintained. Trust is a big concern with every engine you use, because while they can promise to not log anything or sell your info for profit, they often provide no way of proving those claims to be true beyond 'just trust me bro'. The one thing I really liked about Kagi was that they went through a public security audit by an outside company that specializes in hacking your system to find vulnerabilities. They got a great result and shared it publically.

The other concern is that there is no way to be sure companies won't just change their policies slowly over time to creep in advertisements and other things they once set out to reject once they lure in a big enough user base and the greed for ever increasing profit margins to appease shareholders starts kicking in. Companies have been shown again and again to employ this slow-boiling-frog practice, beware.

Still, If you are absolutely concerned with privacy and knowledgeable with computers then self hosting FOSS software from your own instance is the best option to maintain control of your data.

Conclusion

I hope this has been informative to those who believe theres only a few options to pick from, and that you find something which works for you. During this difficult time when companies and advertisers are trying their hardest to squeeze us dry and reduce our basic human rights, we need to find ways to push back. To say no to subscriptions and ads and convenient services that don't treat us right. The internet started as something made by everyday people, to connect with each-other and exchange ideas. For fun and whimsy and enjoyment. Lets do our best to keep it that way.

all 30 comments

sorted by: hot top controversial new old

[–] [email protected] 32 points 1 year ago

Excellent writeup! With constant updates to boot 🥳

I'm saving it for future reference.

Thank you for putting your time and effort on this.

[–] [email protected] 21 points 1 year ago

This is awesome! I recently switched over to Kagi because of several Lemmy posts mentioning it as an alternative. However, I really appreciate you sharing your preferred SearXNG instance. I have it added to my browser's list of search engines and may switch to it from time-to-time to see how different it is from Kagi. If there really isn't much of a difference, then I may switch to it after my Kagi membership ends.

[–] [email protected] 14 points 1 year ago (1 children)

Let me mention my own, a peer-to-peer (not federated) search engine for the decentralized web (ipfs)

[–] Smokeydope 5 points 1 year ago (1 children)

Very nice, I may do a post about alternative protocols and the small web, and don't mind linking to your engine in the IPFS section,

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

I would be very pleased, thanks. Feel free to contact me for any question or details

[–] alphacyberranger 9 points 1 year ago

Good post

[–] [email protected] 8 points 1 year ago (1 children)

good write up, we're Mojeek Limited, not Mojeek LLC; LLC as a formation in the UK is called a "Private Limited Company" (PLC) which is contracted to .ltd or limited.

[–] Smokeydope 4 points 1 year ago (1 children)

Thank you! Apologies for the mistake, it has now been corrected to .ltd.

[–] [email protected] 4 points 1 year ago

ah no bother at all, not everyone is gonna be across every single kind of company and they are functionally very similar!

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago) (1 children)

Great list! Is there a way to easily add SearXNG as a default search engine in Firefox for Android? That's the only thing I'd like to make it even more convenient

EDIT: found it! Search url with current options is hidden at the bottom of the "cookies" section in the settings. Would be nice to have the suggestions API link too, but oh well, I'll take what I can get :)

[–] Viking_Hippie 1 points 1 year ago (1 children)

I was wondering the same thing but didn't really understand from your comment how it can be done.. Could you ELI5 or link me to a guide?

[–] [email protected] 5 points 1 year ago (1 children)

What I did was first set all the options up and save them, and then go to "Cookies" in the search engine settings and copy the "Search URL of the currently saved preferences". Then, open Firefox settings -> Search -> Manage alternative search engines -> Add search engine and paste the thing into the "Search string URL" field, type in the name and press save. After that I just set it as default, and it works.

Hope this helps!

[–] [email protected] 4 points 1 year ago

Great post! Now I have a better understanding of the subject. Another meta search engine I found is 4get.ca

[–] Imhotep 4 points 1 year ago (1 children)

Qwant has web crawlers. It started with indexing the German and French web, and the plan was to progressively rely less on Bing

Since the search engine is a failure I suppose they didn't develop it further.

[–] Smokeydope 2 points 1 year ago* (last edited 1 year ago) (1 children)

Qwant is a weird one for sure, from what I understand most of its life it relied on bing and only started making efforts to build up its own crawlers and such after 2019. Making it a case of a meta search engine trying to become more of a true search engine. They still use bing even if minimally. This info can maybe put in a small footnote about it but I'm not sure if the average person would really care about qwants insider baseball.

[–] Imhotep 1 points 1 year ago* (last edited 1 year ago)

I'm having a hard time figuring it out also. An article says in 2018 they had 20B pages indexed... but they didn't use them?

People won't try to understand the difference between meta search engine and real one most likely. But if they did, i believe some would choose the "60% independent", the same way they choose a "60% recycled material"

[–] blackfire 2 points 1 year ago (1 children)

This is a good list although i generally wouldnt bother listing yacy. Its only as good as the people adding to the list and thats not a lot.

[–] ElectroVagrant 2 points 1 year ago (1 children)

This is a good list although i generally wouldnt bother listing yacy. Its only as good as the people adding to the list and thats not a lot.

Isn't that last point a good reason to mention it, to possibly increase the amount of people contributing?

[–] blackfire 1 points 1 year ago

It's been around for a very long time and not gotten any traction. I just don't think it can gather enough of a community to make it the goto source.

[–] steeznson 2 points 1 year ago

Dropping into this thread late to mention how much I have enjoyed using marginalia.nu - the whole experience is just a joy on a desktop. Not strictly related to search but their wikipedia-type encyclopedia has has been improving a lot recently too.

[–] [email protected] 2 points 7 months ago (1 children)

I discovered searx via this post, currently I use startpage which is the only one that restiuses me results as similar as possible to google. Is it possible to get a similar result with Seraxng? because it seems to me that the results are always very strange, it returns sites that are often inconsistent or unfamiliar. To give an example, if I search for searx in google the first result is wikipedia and the fourth searx.space, while in searxng the results are different depending on the instance used. Thank you

[–] Smokeydope 1 points 7 months ago* (last edited 7 months ago)

Yes it is absolutely possible to get similar results with searxng to start page, in fact some instances allow you to aggregate startpage results directly.

I found https://priv.au/ as an example of an instance that can directly aggregate startpage.

So the trick to the instance weirdness going on is that each instance has its own set of default engines set to aggregate from. For example one searxng instance may want to only aggregate google and bing, while another may want to aggregate only independent search engines that don't use google or bing such as YaCy and Qwant. I've visited some instances that give like 2 results because they only aggregate Wikipedia by default lol.

Each result searxng/searxng gives you will show which engine it aggregated that result from, its in the bottom right corner of each result in small text try to look for them to better understand the sources the instance is pulling from.

Here's what you can do about that: The secret to overcoming this and dialing in the search results you want is to realize you can actually configure each searxng instance to aggregate the engines you want while disabling the default ones you don't. All searxng/searxng instances have a preferences menu usually a gear icon in the top right corner. Or you can go to searxng-example.com/preferences . Once in preferences go to the 'engines' section from there you can tick the engines you want to use.

Some instances save your settings as cookies, some instances save your settings as a sub URL for that instance. The priv.au instance I mentioned saves your settings as cookies.

The best way to use searx is to play with different instances, find one that works pretty well by default then fine tune it. Hope this helps.

[–] [email protected] 1 points 1 year ago

If you are absolutely concerned with privacy and knowledgeable with computers then self hosting FOSS software from your own instance is the best option to maintain control of your data.

Well, only partially. Unless you are sharing that instance with a good amount of people, you'd still be tracked all the same. If you configure i.e. SearX to use multiple tracking search engines, multiply so.

[–] [email protected] 1 points 5 months ago (1 children)

None of the SearxNG links will open.

What's the learning curve for a browser like SearxNG?

[–] bulwark 1 points 5 months ago (1 children)

I've self hosted a SearxNG for about a year now. If your familiar with docker it's pretty easy. I always forget I have it because I pay for Kagi, but I set it up so ollama could use it. It absolutely seems better than going directly to Google or Bing.

[–] [email protected] 2 points 5 months ago (1 children)

I'm not familiar, and reading your response made me feel much older than a Class of 2000 millennial.

docker? Kagi? Ollama?

Google and Bing ✓

[–] bulwark 2 points 5 months ago

Gotcha, https://paulgo.io/search seems to be working if you want to try a public instance of SearxNG. I'm also a class of 2000 damn dirty millennial.

[–] [email protected] 1 points 1 year ago

What about brave search?