FearTheCron

joined 1 year ago
MODERATOR OF
[–] FearTheCron 2 points 1 year ago (3 children)

I am cautiously optimistic about the decentralization and federation. But I think the biggest hurdle is developing the user base right now. ExperiencedDevs is the only subreddit I followed before this all started that directly linked a Lemmy alternative.

[–] FearTheCron 4 points 1 year ago (5 children)

I was wondering if someone would bring up search engine indexing. Google certainly has the upper hand for LLM training data with Reddit's new API change since they have the comments anyway. This is a big reason I fear these API changes, it is very much concentrating power in the hands of already powerful companies.

[–] FearTheCron 1 points 1 year ago

I am also wary of big tech companies using my comment history for their LLMs. However, I worry that the tech companies will scrape data anyway and Reddit's API pricing just locks out the open source LLMs. There are a few of them, a couple that I have played with:

https://github.com/nomic-ai/gpt4all

https://github.com/ggerganov/llama.cpp

Some projects even try to preserve privacy. But I think its more on the side of what extra training data you give it and the queries you issue.

https://github.com/imartinez/privateGPT

[–] FearTheCron 7 points 1 year ago

I totally agree that Reddit's motivation is probably not related to LLMs and the link I posted is more of an excuse than anything. However, I am curious what people think about data scraping and LLMs in general.

[–] FearTheCron 6 points 1 year ago (1 children)

I hope cross posts are OK. But I am curious about Experienced Dev's perspective on this as well since the question is rather technical.

Copying my opinion from the other thread in case you don't want to look at my other thread:

My personal opinion is that high API usage fees hurt open source LLMs (e.g. GPT4All). I would rather not see this new technology monopolized by those who can pay API fees.

[–] FearTheCron 3 points 1 year ago (1 children)

Yeah, I think a creative commons style license makes sense and that was always my intent when posting things. However, when you post creative commons content, you do get to decide the restrictions (e.g. commercial or noncommercial).

I think its currently an open question how this applies to generative AI and LLMs. Perhaps the output of generative AI should retain the license of the training data? Or perhaps that is overly restrictive? There are those who believe that training commercial generative AI on data under permissive licenses is a problem.

https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/

https://slate.com/technology/2022/12/lensas-a-i-avatars-the-uncomfortable-places-their-magic-comes-from.html

I am not really sure where I stand on the overall issue. But the worst case scenario in my opinion is one where open source generative AI is hobbled by regulation paving the way for corporate control. My biggest fear about the Reddit API changes prevent anyone except Google, Facebook, Microsoft, Amazon, etc from using user comments as a training set.

[–] FearTheCron 1 points 1 year ago

I totally agree that Reddit's approach has horrible side effects. However, if hosting costs were not an issue, how would you feel about people using your comments for LLM training?

[–] FearTheCron 3 points 1 year ago (1 children)

Certainly the archived Reddit posts will be used for that for years to come regardless. What I am curious about is how do you feel about your posts contributing to the output of a LLM (independent of API usage costs)?

LLMs can be specialized to tasks by training them further on a curated set of data. For example, a LLM trained specifically on your posts will sound more like you than the LLM before the training. Does it bother you that someone may use your posts for this purpose?

[–] FearTheCron 17 points 1 year ago (2 children)

My personal opinion is that high API usage fees hurt open source LLMs (e.g. GPT4All). I would rather not see this new technology monopolized by those who can pay API fees.

[–] FearTheCron 2 points 1 year ago

Yeah, reasonable ads are key. Moving/flashing things are a show stopper for me. Also, ads shouldn't track private information, I think its fine to base them on the contents of the current public info on a page, but tracking data across sites gets creepy. I don't like using ad blockers if I can avoid it, but many websites are completely unusable without it.

[–] FearTheCron 5 points 1 year ago (1 children)

From a personal perspective, I would like to see a model where basic access is free. A 5$ a month fee is fine for you and me, but I think there are a lot of people who may not have that in their budget or who don't want the paper trail of payments (e.g. if they live in a country that is restrictive of free speech). I am really hoping that voluntary donations are sufficient, but I guess we will see.

[–] FearTheCron 14 points 1 year ago* (last edited 1 year ago) (5 children)

Well, we are on the ground floor here. Let's find something that keeps the lights on and gives everyone the incentives they need to make a great community!

Perhaps a good start would be a page that gives statistics about the time and money required to run an instance. I really appreciate those who have dedicated their time money and reputation to start things up. Lets find a way to build a better social media experience together.

I think many of us would be OK with a number of different models, donations, non-intrusive ads, reasonable subscription fees, etc. Perhaps there could even be incentives for people who put time into building communities by moderating or other tasks. The important thing in my opinion is that everyone feels they contributed to the structure in a way that they want to keep participating.

Edit: I found a budget page from the donation link on the side bar of the main page of lemmy.world.

view more: ‹ prev next ›