this post was submitted on 03 Jul 2023
10 points (81.2% liked)

Lemmy

2172 readers
1 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

Due to the nature of the default robots.txt and the meta tags in Lemmy, search engines will index even non-local communities. This leads to results that are undesirable, such as unrelated/undesirable content being associated with your instance.

As of today, lemmy-ui does not allow hiding non-local (or any) communities from Google and other search engines. If you, like me, do not want your instance to be associated with other content, you can add a custom robots.txt and response headers to avoid indexing.

In nginx, simply add this:

# Disallow all search engines
location / {
  ...
  add_header X-Robots-Tag noindex;
}

location = /robots.txt {
    add_header Content-Type text/plain;
    return 200 "User-agent: *\nDisallow: /\n";
}

Here's a commit in my fork of the lemmy-ansible playbook. And here's a corresponding issue I opened in lemmy-ui.

I hope this helps someone :-)

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 9 points 1 year ago (2 children)

If you do this, I'd recommend excluding at least your most common communities. Google searching Reddit has been a great tool over the years, and improved discoverablity of the service as a whole. Especially for smaller communities.

Feels kind of like shooting yourself in the foot. Maybe just exclude NSFW communities (though, do those even exist here?)

[–] [email protected] 1 points 1 year ago

I agree, you do you, but IMO if you want to host a lemmy instance (that's not private), this is kind of part of the deal. If you host communities, you are literally opening yourself up like this.

[–] [email protected] 1 points 1 year ago

There is no way to exclude individual communities. The post URLs are generic, like /post/1234. From nginx or other proxies, I cannot tell what community they belong to. I would love to have my own be searchable, but not at the price of tainting my project's reputation.