admin

joined 1 year ago
MODERATOR OF
[–] [email protected] 1 points 1 year ago

I could, but I'm afraid that would lead to even more people who don't realize they're looking at copied content. I get enough messages from people who misunderstand the bot as it is :/

[–] [email protected] 2 points 1 year ago

[email protected] is already a thing brah.

[–] [email protected] 7 points 1 year ago

Personally I'd be fine with allowing it in bios only. If people want to see more, they'll check out the bio, and see the link there. In other cases someone will just be like "... Nice." without feeling advertised to.

In the end, it's all about the rules the community itself puts up. Personally, I get more enjoyment out of fewer "real" (imperfect/amateur) out-of-love quality, than more perfect/fitgirl for-profit quantity. But I'm aware this is generally a minority opinion.

 

I'd like to hear some feedback on this, or approach vectors.

Right now the bot is rather spammy. I was hoping that by using Reddits HOT feed, it would return have some level of quality control (I know, right?). Unfortunately, it seems that in most cases, it will just return anything that's new. The downside of this is that a lot of garbage gets through, and the bot spends a lot of time scraping the underlying page to get the details.

I propose to only archive reddit posts that have a karma score of 5 or higher. In case of subs that hide the karma scores of posts for a certain time, they'd have to be at least 2 hours old, so that the Reddit moderators can weed out garbage on our behalf.

Do you folks have any thoughts on this?

Secondly, I want to put sticky comments on each community, with links to native Lemmy communities that cover the same subject. For this I would need some kind of API, or a master list of... oh, I see sub.rehab has just the thing I need. So expect that somewhere this week :).

[–] [email protected] 1 points 1 year ago

Thanks, added as a sticky in the lemmit community.

Ideally I want to have this done automatically.

[–] [email protected] 2 points 1 year ago (1 children)

I'm not sure what you mean, do you have a link and/or screenshot? If it only happens in a specific client, it's probably an issue with that client.

 

See you on the other side!


So the update is done, but the bot was offline for 6 hours, and needed to catch up.

Unfortunately, another update slipped through, which switched the default feed from www.reddit.com to old.reddit.com, which has the side effect of changing all the urls in the posts as well. On one hand this is great, because new reddit sucks. On the other hand, this is terrible, because for every post the bot encounters, it checks if it already exists on lemmit... based on the url.

So for every post the bot encountered, it went like "old.reddit.com/r/blabla/123? Haven't seen that one yet, there's an www.reddit.com/r/blabla/123, but that must be something completely different, let's post it again!"

This also meant that the bot took over a minute and a half to update each community because it takes a couple of second per post. When I went to bed last night, I figured it was just posting a lot of content because it had so much catching up to do. But this morning I figured something was off because it still hadn't caught up.

Anyway, the fix is out now. Sorry for all the duplicates. I need coffee now.

 

ChatGPT, write a post for the stuff that I have in my head and want to get out as an update.

Hmm. No brain implant yet. Guess I'll have to write this the hard way.

Syncing update

It has been an eventful week. I successfully deployed the initial version of smarter content syncing, and have made some adjustments to algorithm since then. Most notably, communities with only 1 subscriber (the bot) will no longer receive updates, and communities with fewer than 5 subscribers or with a low posting frequency will only be updated twice a day. Furthermore, for the highest update priority (every 10 minutes), a community must have a minimum of 50 subscribers. Implementation details can be found in the decide_interval() method over here.

Being a developer is fun

Meanwhile... Damnit, bot is stuck again.

2023-07-08 10:13:39,945 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:48 ago, interval 120 minutes
2023-07-08 10:13:40,653 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:45,324 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
2023-07-08 10:13:46,333 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  2:30:54 ago, interval 120 minutes
2023-07-08 10:13:48,581 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:13:51,227 - utils.syncer - ERROR - Error trying to retrieve post details, try again in a bit; Couldn't retrieve post detail page
...

1 bugfix and deployment later:

2023-07-08 10:46:42,836 - utils.syncer - INFO - Scraping subreddit: bustynaturals. Last time  3:03:51 ago, interval 120 minutes
2023-07-08 10:46:43,573 - utils.syncer - INFO - 'latina bodies are the best' at https://www.reddit.com/r/BustyNaturals/comments/14twww8/latina_bodies_are_the_best/ updated: 2023-07-08 07:14:13+00:00
2023-07-08 10:46:48,327 - utils.syncer - ERROR - Couldn't find post on https://old.reddit.com/r/BustyNaturals/comments/14told8/latina_bodies_are_the_best/, skipping.

Defederation

Meanwhile, the folks at https://lemmy.world reached out to me to tell me they're defederating Lemmit. They are not fond of high volume of posts made by the bot, and the fact that there are now (quick check) 462 communities on this server all being moderated by a single person. They have already received a couple of complaints about spam, and it didn't help that some requests for NSFW subreddits were not marked as NSFW. Occasionally, those subreddits had explicit thumbnails that appeared in the 'All feed' without warning.

I had a good talk with the LemmyWorld admin, wherein they explained their point of view, and I explained mine. I understand their decision to disassociate with Lemmit, and appreciate their attempt to contact me. Other instances like Beehaw, and some smaller ones have also reached the same decision.

This does mean that you will no longer be able to get new community updates on those servers. So make sure to check the blocked instances list on your home server if you were subscribed to Lemmit. At the same time I have removed all the subscriptions of users from those servers, in order to not affect the sync priority mentioned above. This does mean, that if LemmyWorld, Beehaw, etc ever decide to connect to Lemmit again (however unlikely), you will need to un- and re-subscribe from there.

Meanwhile, I've added a feature in the bot that will remove request posts for NSFW subreddits, if the post itself is not marked for NSFW. This should prevent explicit thumbnails showing up where they are not wanted.

Server growth

Last night I got an alert from my server monitoring that the disk is 80% full. Unfortunately, the disk is only 60 GB, so that doesn't leave much room for expansion. On the bright side, a good chunk of that is from Lemmys very verbose logging (like, 4 GB a day, which gets cleaned up daily), so it should last throughout the weekend if I tune that down. Furthermore, most of the storage growth is from from pictrs, the image upload part of Lemmy, and that can utilize an S3 bucket, rather than using the VM's storage like it is now. Using an S3 bucket offers a cost-efficient solution for expanding storage. Initial estimates indicate a monthly cost of around $5 for 1000 GB of storage, which should be sufficient for a while *fingers crossed*.

In the early days of Lemmit (literally, as the server is less than a month old) image uploads were limited to a default setting, which was something around 40 megabytes. That did add up quickly (thanks to half-minute porn gifs), and so I had to limit the max filesize to 1 MB, and later 0.5 MB. Once the server has switched to S3 storage, I can probably up that limit a little, although not too much.

Finally, Lemmy v0.18.1 has been released, and it contains even more performance boosts compared to v0.18.0, so if there's time left this weekend (and I can verify the Lemmit Bot is compatible), I will probably perform the upgrade.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Cheers, ~~both~~ all three of you. We're off to a beautiful federated future.

[–] [email protected] 5 points 1 year ago (4 children)

Congrats on reaching this set of sane rules. The efforts of creating an admin community behind the scenes are really starting to show off.

Request for clarification for uhmm, a friend of mine: When someone creates that own instance, with blackjack and hookers, and one of your users subscribes to a community there, it will synchronise part of that content to lemmynsfw. What will you do then?

I'd like to remind you that some beautiful maniacs can be quite reasonable ;)

[–] [email protected] 2 points 1 year ago

Yeah, I've upped the limit on this server, so it should come through now if you retry.

[–] [email protected] 1 points 1 year ago

Could you give an example post of what you mean? Every Post starts with "The original was posted on /r/blabla", in which the latter links to the original, old.reddit.com link, will that work for you?

[–] [email protected] 1 points 1 year ago

OMG, That did it!

Can't believe I hadn't tried that myself! (well, or that nobody else suggested it in that github issue).

I won't be applying that patch, because I don't really want to mess with the deployment system, but I did leave a list of current NSFW communities in the comments.

 

You know, on account of me upping that one setting in the admin which I should have thought of long ago.

[–] [email protected] 2 points 1 year ago

Apologies for the late reply, here's a list of the current NSFW subs on the server. I'm not quite sure on how to keep this list up to date automatically, but I'll figure something out.

{
 "nsfw_communities": [
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])",
  "[[email protected]](/c/[email protected])"
 ],
 "all_count": 409,
 "nsfw_count": 68
}
[–] [email protected] 2 points 1 year ago (3 children)

That could work, but it would be terrible for discoverability. In the mean time, I put up a feature request at Lemmy. I'm not a fan of pushing my problems upstream, but in this case it would actually be the easiest solution - as far as I can see (and I have 0 experience with Rust) they only need to adjust the validation regex, because the database already allows for it. That is - as long as the ActivityPub protocol allows for it.

If they deny it, I could try something with name mapping, but you'd either end up with something that is unreadable, or something with a high collision chance. Neither option is very appealing. For now I'm just going to wait and see.

4
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 

Okay, this one took me a bit longer than I planned (mostly due to sql fun and trying to use integers as minutes, WEEEE!).

Backdrop: Last week I disabled the mirroring of a couple of subreddits to the database, because they were initially requested but the nobody subscribed to them. At the same time, the bot was just crawling in a loop, starting at todayilearned, ending at latestsubreddit. As more subreddits were requested, this loop took longer and longer (21 minutes before I rolled out this update). This wasn't sustainable.

So here's the new situation. The more popular a community is, the more often it will be updated. In this case popular means a mixture between number of subscribers and the amount of posts it receives per day (Link to relevant snippet of source code).

In short, the most popular subs will be synced every 10 minutes, the next tier ever 30 minutes, 120 minutes and the content with either no posts per day or no subscribers (other than the bot), will only be synced every 12 hours. I hope this will hit a good distribution of updates vs popularity, but it will most likely be refined at some point in the future.

Speaking of distribution, we now have over 300 communities on this server 🥳, and their update intervals are spread out as such:

  • Every 10 minutes: 22
  • Every 30 minutes: 39
  • Every 60 minutes: 55
  • Every 120 minutes: 143
  • Every 720 minutes: 44

With this update running live (I started typing after I deployed it, and it has now gotten through the backlog of 'abandoned' subs), I'm going to step back from feature development for a few days. Any bugs that cause the bot to crash will of course continue to be addressed.

Have a blast!

 

Before was running on the cheapest model (1 core / 1GB mem / 30GB storage) at $12/month. The machine was running pretty low on memory, causing it to start swapping, which in turn caused the cpu to get too busy, and everything to slow down.

Now it has a whopping 2GB of memory, and things seem to have calmed down - cpu is back to around 10-15% usage, and swap is down to 0. Happy times all around.

Because of the amount of subs being archived, it now takes about 15 minutes between updates for each sub (was 18 before I updated the VM).

I'm planning to build some kind of scoring system, based on the amount of posts per subreddit (per day?), and amount of subscribers on the lemmy community. That way communities with little subscribers or that don't see many posts per day, will only be updated once per hour.

At the same time, I feel that subs like AskReddit, OutOfTheLoop and other "question-based" subreddits shouldn't be archived by Lemmit. In my opinion those kind of posts are useless without those answers, but please let me know if you disagree.

 
  • Fixed a bug where posts would not be submitted because the title didn't contain long enough words.
  • Fixed a bug where posts would not be submitted because the url was too long.
  • Fixed a bug where posts would not be submitted when it was linking to a /user subreddit.
  • Fixed a bug where the bot would think Every Post Everywhere was a subreddit request, and would reply to it.
  • Fixed a bug where the bot would crash without recovering whenever something went wrong during new subreddit requests

A fruitful day all in all, I'd say.

 

That the replies-everywhere-bug was just because I forgot to include a variable in the bot deployment? 🤦

7
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 

In the short time since this instance and bot launched, I've been seeing the same questions resurface multiple times. This is totally understandable, since the concept of a Fediverse is still new to most (myself included), and this server is not like the others.

Q: What is Lemmit?

A: Lemmit is a Lemmy instance specifically designed for archiving Reddit content. Users can request new subreddits to be included in the archiving process by posting in the [email protected] community. It is powered by an open source python bot, which periodically checks the request list, adds new requests to the queue, and continuously monitors the Hot feed of those subs for new posts to cross-post here.

Q: Does it synchronize comments?

A: No, that would be impossible. Considering there are thousands of posts already on Lemmit, many of them having at least several hundred comments on Reddit, often buried in deep layers, it simply wouldn't be feasible to index those for more than a few posts, let alone keep them up to date.

Unfortunately, this means that archiving certain subreddits, such as Ask Historians/Men/Women/Hyperintelligentshadesofthecolourblue-type subs, is going to be rather pointless.

Q: Can it send comments back to Reddit?

A: No, it cannot. The purpose is to help bootstrap the Lemmy platform, not to serve as a bridge between the two networks. Also, see the answer about synchronizing comments.

Q: Can I request any subreddit?

A: Technically, yes. However, as the list of subs grows, the time it takes to update all of them will also increase. I do not have strict guidelines in place for this, so I'm relying on your common sense (hoooo boy). At some point, I will probably have to either stop accepting new requests or disable scraping for very low-traffic communities.

Q: Does this use the API? Will it keep working after July 1st?

A: Nope, it uses a combination of the public feed and scraping old.reddit.com. So, as long as those are still available, it will continue working. And even if they close those sources, there will probably be new ways to achieve the same effect. "Content, eh, finds a way."

Q: This is spam, can you stop?

A: First of all, I apologise for the inconvenience. All you have to do is block @[email protected], and none of its posts will ever show up on your instance. If you you don't want anyone else on your server to be exposed to this bot/instance, you should convince your admin to defederate from lemmit.online. Since there are no other users on here, there will be no harm done.

Obviously I could stop, because running this server and software is only ever going to cost me time and money. But for the reasons listed above, I still think this server is a useful addition to the lemmyverse at this time. But I'm looking forward to the day where I can turn the bot off because it's no longer needed.

Q: What started this?

A: Okay, nobody asked this, but I'm going to tell you anyway. After Reddit made it clear that they are effectively killing third-party apps and implementing plenty of other anti-end user decisions, I realized that I would either have to accept not being able to access my time-wasting content or have to do so in a rather uncomfortable way (either through the official app or old.reddit.com for as long as they'll allow it to exist).

Being a stubborn developer, naturally, I chose option C: Have my own Reddit. With blackjack, and hookers. This way, I would still be able to access my beloved content without being beholden to Reddit's mood swings and abusive relationship tendencies.

Besides that, I also know that Content is King. So I'm order to counter the network effect (No users because no content, No content because no users), I figured it would be better to have some inorganic content to bootstrap the adoption of Lemmy.

Q: Are NSFW subreddits allowed?

A: Absolutely. Like I said: Blackjack and hookers.

Q: My request isn't picked up by the bot!

A: That isn't a question. But yeah, the process isn't flawless yet. I'm trying to iron out all the bugs as I encounter them. In the meantime, feel free to re-request the subreddit by making a second post. No harm done.

Q: No new posts are showing up at all on Lemmit

A: If no posts are appearing on the Lemmit Frontpage (sorted by NEW), it's possible that the bot has crashed or is stuck on something. Since no software is flawless, this sometimes happens. I usually fix this as soon as I'm aware, and I'm happy to say that these kinds of fatal errors are becoming less and less frequent. However, they may still occur, and as a human with needs of sleep and other responsibilities, I'm not always able to fix them immediately.

Q: Posts aren't showing up on my instance, what's up?

A: Due to the spammy nature of the bot, some server admins choose to block this server, and that is completely understandable. So first of all, make sure to check the instances link in the footer of your home server. If Lemmit is the Blocked Instances list, you're out of luck.

When you have verified that Lemmit is not blocked on your instance, try unsubscribing, waiting a little, and then re-subscribing. That tends to fix things.

 

Long story short: I messed up with the domain registration for this instance, and never replied to a mandatory email. The domainname (lemmit.online) got put in suspension, causing disconnects all over the fediverse.

I fixed it as soon as I found out, but it will probably take a few more hours for the issues to be fully fixed.

So ehm. Whoops. Hope this explains and fixes the federation issues we've been having today.

 
2
/r/food (lemmit.online)
view more: next ›