this post was submitted on 15 Jun 2023
54 points (98.2% liked)

Lemmy.World Announcements

29081 readers
250 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news ๐Ÿ˜

Outages ๐Ÿ”ฅ

https://status.lemmy.world

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

founded 2 years ago
MODERATORS
 

One of the most unforgivable things about reddit is how pathetic the search engine is, considering the amount of free, top notch information is captured there and you need google +reddit to get at it, what can we do to make federated alternatives self searchable ?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 1 points 1 year ago* (last edited 1 year ago)

I've posted about this before as it relates to mod tools.^1^

The search part isn't all that difficult, there are open source search engines that are easy enough for admins to configure a decent search feature. The more difficult issue is aggregating the data from all our instances to a single source where we can make queries with those existing search engine tools.

I am going to spend some time this weekend working on a proof of concept for a search engine for mod tools. Big picture solution is:

  1. Instance admins regularly dump anonymized (i.e. no PII) post and comment data to a public source (possibly torrent, possibly sftp)
  2. Other instance admins download each others data and feed it into their search db (e.g. Elasticsearch)
  3. Mods & users create tools using this data

BTW: this isn't a novel idea:

  • This is what pushshift is for reddit (check out their FAQ/wiki). We're missing mod tools big time and searching/aggregating is huge part of mod tools.
  • Up until recently, like last week, Stack Exchange provided a regular dump of their data to the Internet Archive for posterity's sake

EDIT: Linked my OG post on the subject ^1^