this post was submitted on 03 Jul 2023
10 points (81.2% liked)

Lemmy

2172 readers
47 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

Due to the nature of the default robots.txt and the meta tags in Lemmy, search engines will index even non-local communities. This leads to results that are undesirable, such as unrelated/undesirable content being associated with your instance.

As of today, lemmy-ui does not allow hiding non-local (or any) communities from Google and other search engines. If you, like me, do not want your instance to be associated with other content, you can add a custom robots.txt and response headers to avoid indexing.

In nginx, simply add this:

# Disallow all search engines
location / {
  ...
  add_header X-Robots-Tag noindex;
}

location = /robots.txt {
    add_header Content-Type text/plain;
    return 200 "User-agent: *\nDisallow: /\n";
}

Here's a commit in my fork of the lemmy-ansible playbook. And here's a corresponding issue I opened in lemmy-ui.

I hope this helps someone :-)

top 8 comments
sorted by: hot top controversial new old
[–] [email protected] 9 points 1 year ago (2 children)

If you do this, I'd recommend excluding at least your most common communities. Google searching Reddit has been a great tool over the years, and improved discoverablity of the service as a whole. Especially for smaller communities.

Feels kind of like shooting yourself in the foot. Maybe just exclude NSFW communities (though, do those even exist here?)

[–] [email protected] 1 points 1 year ago

I agree, you do you, but IMO if you want to host a lemmy instance (that's not private), this is kind of part of the deal. If you host communities, you are literally opening yourself up like this.

[–] [email protected] 1 points 1 year ago

There is no way to exclude individual communities. The post URLs are generic, like /post/1234. From nginx or other proxies, I cannot tell what community they belong to. I would love to have my own be searchable, but not at the price of tainting my project's reputation.

[–] [email protected] 4 points 1 year ago (2 children)

Please don’t do this and keep information easy to google. The best part of Reddit was how much hours of time it saves when googling for information on stuff

[–] Tandybaum 1 points 1 year ago

I just found this thread because I was curious about the indexing of Lemmy.

I totally agree with you. One of the best parts of Reddit is when you google that super weird niche question you’ll get a bunch of Reddit links.

[–] Tandybaum 1 points 1 year ago

I just found this thread because I was curious about the indexing of Lemmy.

I totally agree with you. One of the best parts of Reddit is when you google that super weird niche question you’ll get a bunch of Reddit links.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago) (1 children)

Would it be a better idea to exclude any URLs that are similar to /c/*@*.* I think that would block external communities but keep local ones still indexable in their native locations.

[–] [email protected] 3 points 1 year ago

Or maybe the lemmy source code should include a canonical tag to the original host’s post?

load more comments
view more: next ›