Your website can now opt out of training Google's Bard and future AIs : technology

[–] [email protected] 98 points 1 year ago (4 children)

It’s just a robots.txt flag that explicitly mentions a google user agent string. This is about as effective at stopping AI from training on your data as a “no trespassing” sign hidden behind the hedges of your unfenced lawn is at stopping trespassers.

[–] cheese_greater 22 points 1 year ago* (last edited 1 year ago)

Ur just a robots.txt ;)

Edit: heh, ad robotinem

[–] [email protected] 13 points 1 year ago* (last edited 1 year ago)

We could put it on the various Lemmy sites, but it's even more ineffective because of federation.

Not sure what the analogy would be in that case, and infinite number of people get to decide if that sign exists on your lawn?

edit: An infinite number of people have a copy of your lawn, and they need to put a sign on it

[–] [email protected] 6 points 1 year ago

I’ve always been told the Robots file is pretty much just a suggestion.

[–] Touching_Grass 2 points 1 year ago* (last edited 1 year ago) (2 children)

They can also delete their website. It doesn't need to be publicly available.

[–] [email protected] 14 points 1 year ago (2 children)

I enthusiastically support deleting the internet.

[–] AdamEatsAss 7 points 1 year ago

Now this is a movement I can get behind

[–] breakingcups 1 points 1 year ago

You can take initiative by starting with your own comments 😉

[–] [email protected] 1 points 1 year ago

Let's go to Gemini!!!

[–] [email protected] 44 points 1 year ago

Just like Google honors "do not track", right?

[–] cheese_greater 18 points 1 year ago

"opt" "out"

[–] vimdiesel 14 points 1 year ago

lol I’m sure the AI profiteers will honor this

[–] FlyingSquid 12 points 1 year ago (2 children)

Until some AI is trained to ignore that.

[–] [email protected] 27 points 1 year ago

You don't even need to train the AI to ignore it. You just need to not specifically tell it to pay attention to it.

[–] Asudox 4 points 1 year ago

Until I block that user agent from accessing my website like an IP block.

[–] [email protected] 11 points 1 year ago* (last edited 1 year ago) (4 children)

So since what’s available now isn’t actually AI, what do we call it when we do get real AI? Will it be like what happened with HD? With True AI™ followed by Ultra AI™, AI4K™, and so on until we just call them master?

[–] [email protected] 11 points 1 year ago

I've seen AGI thrown around. Artificial General Intelligence.

[–] imperator3733 9 points 1 year ago

"Artificial General Intelligence" (AGI) seems to be the new term for what used to be considered AI.

I'm sure they'll move the goalposts once again whenever "AI" stops bringing in the money and the VCs/Wall Street get ridiculously focused on "AGI" startups and scammers.

[–] [email protected] 5 points 1 year ago

AGI (artificial general intelligence) is the current term for "The Concept Formerly Known As AI". Not really a new term, but it's only recently that companies decided that any algorithm can qualify as regular "AI" if they consider it good enough.

[–] dangblingus 9 points 1 year ago

AI puts on ski mask Alright, I promise I'm not an AI scraping your website.

[–] TwoGems 8 points 1 year ago (2 children)

Is it technically possible to prevent AI scraping on your website?

[–] dangblingus 7 points 1 year ago

No. There is nothing a website admin can do to prevent it. Every single tool to flag an AI would be circumvented by the AI learning what tools are being used.

[–] [email protected] 6 points 1 year ago

How do I opt in?

[–] [email protected] 2 points 1 year ago (1 children)

This is the best summary I could come up with:

Large language models are trained on all kinds of data, most of which it seems was collected without anyone’s knowledge or consent.

Now you have a choice whether to allow your web content to be used by Google as material to feed its Bard AI and any future models it decides to make.

It’s as simple as disallowing “User-Agent: Google-Extended” in your site’s robots.txt, the document that tells automated web crawlers what content they’re able to access.

“We’ve also heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases,” the company’s VP of Trust, Danielle Romain, writes in a blog post, as if this came as a surprise.

On one hand that is perhaps the best way to present this question, since consent is an important part of this equation and a positive choice to contribute is exactly what Google should be asking for.

On the other, the fact that Bard and its other models have already been trained on truly enormous amounts of data culled from users without their consent robs this framing of any authenticity.

The original article contains 381 words, the summary contains 190 words. Saved 50%. I'm a bot and I'm open source!

[–] cheese_greater 0 points 1 year ago

Good robot

[–] j4k3 -3 points 1 year ago (2 children)

People fundamentally fail to understand what AI is useful for and what it is doing. It is not anything like an Artificial General Intelligence. It is like a better way to search for information and interface with it. Just use open source offline AI, not the proprietary crap. The real issue is not what the AI can create. This is no different than what a person is capable of when they are aware of the same content, albeit code, art, music, etc. Just because I am inspired by something, due to my awareness does not give the original inspirational source a right to my thoughts or products. AI works at the same level. It is an aggregate of all content, but contains none of the original works any more than a person that knows about the paintings and works of an artist and tries to paint something in a similar style.

The real issue that people fail to talk about is that AI can synthesize an enormous amount of data about a person after prolonged engagement. This is like open port access directly into your subconscious brain and there are plenty of levers and switches it can twist and toggle. Giving this kind of interpersonal access to a proprietary stalkerware system where parts of humans are whored out to the highest bidder for exploitation, that is totally insane. This type of data can manipulate people in a way that will sound like science fiction until it normalizes. Proprietary AI is criminal in its potential to manipulate and exploit especially in the political sphere.

[–] [email protected] 4 points 1 year ago (1 children)

It's not the same as an artist being inspired. It's more like an artist painting something in the style of someone else. AI can generate anything new and it doesn't transform things in its own way. It just copies and melds together. Nothing about it is really it's own. It's just a biased algorithm putting things together. Moreover, the artist could actually forget what the painting looks like, but still be inspired. If you erase something from the LLM, it will change it's output. It's basically more of a constant copying.

That analogy is what a bunch of people who want to sell AI art try to pitch. It's the difference between content and art.

[–] j4k3 2 points 1 year ago* (last edited 1 year ago) (1 children)

It is possible to do more of what I would call inspired. Models are not just restricted to "in the style of" in that unrelated abstract ideas can be mixed to create something altogether new. It takes a good model and training, but like this is just from 15 minutes of messing around in Stable Diffusion trying to make Van Gogh do his best impression of Bob Ross. I'm adding all kinds of inspirational concepts all the way to emotions and contrasting them and doing this in layers of refinement using a series of images. I'm not very practiced at this. I would call this an artist's tool. Yes it changes the paradigm, but people need to get over their resistance to change as this is evolution; adapt or die.

I used tricks like image to image, and this was not my best result as far as Van Gogh:Bob Ross, but I like it most of the 150 images I made.

Positive: texture, (in the style of Vincent van Gogh:Bob Ross), [nasa], swirl, spiral, foreground tree, mountain drive, kindness, love, masterclass, (abstract:1.8), painting, dark, silhouette, swirls, texture, branches, ocean waves, anger, lonely

Negative: red, (signature), multiple moons, buildings, modern, structures, guard rail, snow, realism, yellow, orange, detailed mountains, left side line, stretchy stars, brake lights, forest

Seed: 1053938996 Model: Absolute Reality V1.6525

[–] [email protected] 2 points 1 year ago (1 children)

I think you're missing the point. You're still generating something purely based only on other things. There's nothing of an artist in there. There's no message. There's no art. You created content. You aren't in there. And I know this seems odd because there's no way to know this without extra knowledge, but something is lost. And it's not an artist's tool. It's a non-artist's tool.

[–] j4k3 -1 points 1 year ago (1 children)

You are wrong because you are arbitrary in your assumptions. I have spent years painting cars and doing graphics and airbrush professionally. I am a Maker. I can craft with almost any medium both digital and physical. Once upon a time, anyone that did not craft all of their own colors and base media were considered fake artists. This is a tool. I can create exponentially more than you to search for a better composition. So can you, so can everyone else. The stupid people will resist this change while intelligent people will learn the tech, adapt, and raise everyone's expectations about what art really is. This is the fundamental shift happening right now. The value of time investment has changed drastically. If you can't adapt to that change you only hurt yourself in the end. Open Source offline AI at a useful level is around 6 months old. It is at the stage where products targeting end users are still getting developed. In the next 2 years, everything is going to be different. In 10 years the quality of art media will make the present look like child's play. Feel free to plan your own obsolescence. This is the biggest game changer since the internet of the late 90's. It is funny how people that have not tried it or really looked into what this can be used for have strong opinions about it, or put their head in the sand when they are told. I got it to learn computer science so that I can upload a book as a database and ask the book plain text questions, and so that I could do some interesting CAD techniques in Blender. The second I saw I could question a book offline with citations, I was sold.

[–] [email protected] 2 points 1 year ago (1 children)

I'm not arbitrary. I explicitly gave a reasonable difference between content and art. You can create content without soul, that's fine. I'm not saying you need to mix your own paint. I'm saying art is inherently human by definition. You can pump out all the content you want, but it will just make finding decent art that much worse. It's like saying ChatGPT can pump out android apps more quickly, but I don't think anyone would argue it'd raise the quality of the Android app markets.

You're just thinking of everything from the point of view of middle management. Quantity over quality.

When you remove humans from the equation, it's not art. It's content. It's disposable fluff. It's mass produced. It's soulless. But sure, think yourself intelligent because you literally put money over anything else. Why don't you just flood the market with remakes and remasters at this point. It fits your argument.

You can't raise an expectation of art by literally removing any meaning to it.

[–] j4k3 1 points 1 year ago* (last edited 1 year ago) (1 children)

You need to learn and try this. You don't know what you don't know and you are making a lot of bad assumptions. The result is not random. The creativity is understanding what the words do and the process just like any other art. There is a lot of nuance. Every word I chose has an impact in both sets of prompts. This is a the result of taking the best image of 60, and and then using it to generate a chain where I slowly adjusted a whole bunch of tools to make this output. I got to the point where each new iteration has very little change to the final image. The word order matters, the "()" brackets strengthen the power and even more if it includes a number like ":1.8" The "[ ]" makes something weaker. Words are more powerful at the beginning and the last word. The placement of composition, technique, and metadata words matters. There are dozens of other techniques just when it comes to the basic settings, and there are limitless ways to alter the output learning about how the AI actually works. This is similar to what digital photography did to film photography. Is it going to kill old techniques? it will completely change the paradigm.

With the best outputs from AI, you can't spot the difference unless you are told; no one can. This is the only thing that matters in the end. Art is made to be looked at, and if the viewer can't tell the difference, that is the only difference that matters. I'm not 'the enemy,' this isn't a team sport or black and white. I'm just a regular dude actually using this to improve myself. I've used it enough to know what I'm doing, and know what I'm talking about, but like, I barely touch image generation stuff. If I spent a week putting together the toolchains better I could produce a much better image that what I posted.

[–] [email protected] 2 points 1 year ago (1 children)

Every word has an impact that you can't predict. So no. All your words and condescending tone speak more about what you don't know. You are are hitting a button and continually trying new things until you get the results from the AI that you want. That is not the same. Especially since you'll start just changing things just because your original intent didn't match what you want so you'll start reaching for other synonyms and the like.

It simply isn't the same as human inspiration. There's a reason courts voted against giving rights to AI generated art to the prompt creator. Their reasoning holds.

Just because someone might not be able to tell the difference between a forgery and the real thing doesn't make them both equally art.

Same holds true to your example which I literally already used and explained why it didn't work. Are you even reading my comments or just ranting?

[–] j4k3 1 points 1 year ago (1 children)

You have no clue what you are talking about. I can dial in very specific results anywhere I want and at any point with the tools. I can mask any area and control what it does through prompting. I only used basic tools for a few minutes with my most simple tool. I could open up ComfyUI and make a much more detailed network. I can figure out the new Open Dream GUI and break apart images into mask layers and generate whatever I want on these. Or if I cared anything about it, I would do all of it myself on the command line like I am doing with text generative AI. If the only tools you've seen are those posted by proprietary companies online, you have no clue how this really works or what is possible.

[–] [email protected] 1 points 1 year ago (1 children)

If it's specifically what you want, it's not AI otherwise you'd be over fitting.

I'm not talking about any specific tools. I'm talking about the actual theory. I'm glad you can contradict yourself by claiming very little can get you immense details (except it's also exactly what you want?)

I'm sorry I offended you and that you're getting ridiculously angry and defensive when I said creating something via AI isn't art.

[–] j4k3 1 points 1 year ago (1 children)

I'm not angry at all. I'm simply doing a very poor job of helping you see what this really is. I keep trying because I really care to help people see the potential and where this is inevitably going in the near future. I don't care about the AI, I care about you.

AI has been marketed as a product because there was a large investment into proprietary AI as a product. The majority of articles and media are created based of these corporate interests and are not well grounded or are outright mis/disinformation. I am absolutely against proprietary AI, but open source offline AI is a completely different thing. Offline AI is a framework and not a product. It is only limited by the creativity of developers and hobbyists.

Even with Automatic1111/Stable Diffusion like I used for the image yesterday, it is quite easy to do specific tasks in areas. The base prompt is somewhat limited in what it can do. So like, to make this image I started off with a wide frame image of 768 × 512 and I batched the generation to make 60 images to choose from. I set how much variation each image would have and how closely the prompt would be followed overall and how much creativity aka randomness was allowed. This made a weak overall composition, but I chose one image that didn't have very well defined features. Then I captured the seed and prompt I used to make this image and I moved over to the Image to Image generation tab with a bunch of extra tools. I tried a few mods in the wide aspect format, but I didn't really like them, and there were some errors creeping in because the AI model I used was trained on 512 × 512 images. I tried cropping to 512, but after trying an automatic compression it made the image more abstract and I liked that, so I went with it. I used a couple of tools that basically ask the AI what it sees in the image using words in its vocabulary, and then I started playing with eccentric prompt words. I played with their locations and a bunch of things that didn't work like I wanted. While I was messing with the prompt wording, I was using the base image to generate from. Instead of starting each prompt from mathematically random noise like any tools you have likely seen, I am using the base image and determining how much noise is added back into it before the AI starts iterating back over it. If I just add like 20% noise to the base image, the structure of the image will still stay in tact. This is super easy to do in automatic 1111, but it does mean it will regenerate the whole image with some minor variation. This is just the easiest and quickest way I can make something. Even with this, there is an option to paint a color over an area or mask/erase something and replace it with stuff like random noise so that it gets remade. These tools are not super effective for me in A1, but some people really dial them in well, I just haven't taken the time to figure it out. ComfyUI takes generating to a whole different level of control where people generate some really interesting and detailed images. The latest tool I haven't tried yet is Open Dream and it has full masking and layering capabilities like Gimp/Photoshop. Even with A1, I can do something like take an image of a girl, and import it to Image to Image as my base to generate from. Then I can set my noise to something like 35%. I can use the Inpainting tool and draw a teal blob around her neck. Then I can prompt something like "A girl wearing a teal knit scarf" and the AI will turn that teal blob into a scarf. I can also mask the face and generate so that I don't alter the girl. This will generate everything I need for lighting and realism to make it appear like the girl was photographed wearing the teal scarf. It may take a good bit of trial and error just like with gimp but it is entirely doable even with the most basic of generation tools. There are still some limits like generating water droplets is not great, but there are new models and techniques coming out of academia weekly right now. Like I generated on tools that are already quite deprecated. The newer SDXL models are ten times more powerful in what they can do with prompts alone before the software tools get involved.

Don't think of this tech as a product, that simply is not true. It is a tool and it is a productivity multiplier. One of the leading researchers in AI, Yann LeCun said it best when he called open source LLM AI the opportunity for everyone to have individualized education like what is enjoyed only by the super rich class of society, and it is the opportunity for everyone to surround themselves with a group of experts to give them information at a moment's notice like the CEO of a company that hires and utilizes a group of experts available at any time.

These are tools only. They don't create products; we do. Much like a CEO is a generalist at the center that makes decisions by taking in information from smarter people in their niche expertise, the tool is very capable, but you need to be a good CEO and put together a good team and learn to trust them. Does this mean that other experts will be obsolete? Absolutely not, but the shift in possibilities means it is very VERY important for everyone to be aware of and dabble in the art of being their own generalist CEO. This is a technological leap forward that is pronounced, and that means it will have a broad impact. The figurative CEO will be much more efficient in their output. The current output to value ratio will be adjusted. The things that were impossible due to time constraints and cost are now accessible. This is what will change values.

If people in general like yourself fail to adopt and adapt to the technology, the current output quality and expectations will stay the same using a lot less people in the process and all the profits will be consolidated in the process. However, if a sizable population adopt this and raise the bar for expectations and quality, state of the art moves forward. The corporate propaganda machine is pushing public opinion into naïveté in an attempt to profit. Do what you will with this, fight it if you want. I have no motivation except to try to make a better future for myself. All this bla bla bla is because I care about you stranger.

-Jake

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

You seem to be misunderstanding my position entirely. I suggest you read my first comment again. Cause you're using a lot of words and details to explain useless stuff.

Edit: I don't disagree it's a tool. I disagree its the same as a person being inspired by others. And I am against the claim that they should freely use whatever they want without credit.

[–] j4k3 1 points 1 year ago

There is no such thing as a truly unique idea. Everything is a result of combining concepts and experiences in different and novel ways.

AI doesn't freely use anything in a reproduction type of format. It can't regenerate a copy of a work It is no different that human awareness of a subject. Restricting this type of awareness is the same as the thought policing nonsense of humanity's past eras of stupidity. No court will ever restrict this type of information and access. If they did I could sue someone for their thoughts. This is draconian nonsense.

[–] [email protected] 2 points 1 year ago (1 children)

There is nothing in LLMs that is able to verify the truth. They should not be used for accurate information unless we make some sort of technological breakthrough on that front. It's really good at generating plausible text though.

[–] j4k3 1 points 1 year ago* (last edited 1 year ago)

People, and the internet are no different. The vast majority of information that exists is incomplete or wrong at some level. Skepticism is always required but assessing any medium by its performance without premeditated bias is the only intelligent approach that can grow with improving technology. Very few people are running the larger models (like a 65B or larger) that they fully control in an environment where they control every aspect of the LLM. I have such a setup on my hardware running offline. On its own my system is ~95% accurate on the tasks I use it for and it is more accurate at these than results I find when searching the internet.

There are already open source offline models specifically designed to work on scientific white paper archives where every result cites the source from its database.

Agents are a class of AI where the AI is running a multi-model system and where one model can send the prompt to more specialized models or a series of models equip to check and verify a response and do things like cite sources or verify against a database.

Technology

Our Rules

Approved Bots