this post was submitted on 18 Jun 2023
13 points (88.2% liked)

ObsidianMD

4047 readers
1 users here now

Unofficial Lemmy community for https://obsidian.md

founded 1 year ago
MODERATORS
 

AI-TRIGGER WARNING: I've asked ChatGPT to revise my writing because it was ass (writing a stream of coherent looking text is not my forte). Proceed at your own discretion.

Yes the emoji 's all on me, I've been too much influenced by Bing Chat lately---even ChatGPT took it out but then I pestered it to move it back.

Below this line it's all text that has been retouched by AI ๐Ÿ˜ฑ:


Title: Archiving Reddit Threads During Protests: Suggestions Needed

Body:

Hello everyone,

As many of you are aware, numerous Reddit subreddits are temporarily closed due to the ongoing protest. While I completely support this action, it is causing some issues with my hobby research. Many posts are being deleted or replaced with placeholder scripts, leading to a loss of valuable information. Source: https://lemmy.ml/post/1259772

In an effort to address this, I have been using a script to save Reddit threads that I find interesting to my Personal Knowledge Management system: https://www.reddit.com/r/ObsidianMD/comments/104k0om/script_save_reddit_posts_to_obsidian/ . I have managed to successfully use it, but since I don't have a strong understanding of Ruby code ๐Ÿ˜…, I'm worried about its future functionality, especially if it depends on the Reddit API.

I recently discovered a thread discussing Reddit dumps: https://lemmy.nz/post/52092 . This discovery made me curious if it would be possible to modify the Ruby script to work with a local version of Reddit or even directly with the Reddit logs. To my understanding, these logs are in JSON format, but I haven't downloaded them yet.

Additionally, I've come across the concept of vector embeddings and a tool called Pinecone. Would it be more straightforward to use this tool to extract the necessary information, as opposed to manually searching through the data? Ideally, I would like to create a local search function, similar to Google, specifically for this dataset dump. However, I'm unsure of how to search a local database of Reddit submissions. I have found potential solutions such as Semantra and Qdrant, but I'm uncertain if these are the best tools for this task. Perhaps there is a more suitable option?

I will be honest, I don't have a strong background in technology, and this problem is proving to be quite complex. But I'm willing to tackle it. I would greatly appreciate any input or suggestions that you could provide.

Thank you in advance, everyone! ๐Ÿ˜Š

top 9 comments
sorted by: hot top controversial new old
[โ€“] dethb0y 2 points 1 year ago

would be curious my self if there's some good way to do this

[โ€“] trijste 1 points 1 year ago (1 children)

What about an intermediary, like one of the Chrome extensions that let you select text and create a note in your vault? Are you doing this sporadically, or automatically?

[โ€“] Yodadidas 6 points 1 year ago (1 children)

Can you see your 3 reply duplicates? If so, can you please delete them?

[โ€“] trijste 1 points 1 year ago

I did not see that. Thanks!