"The increasingly distorted images produced by an artificial-intelligence model that is trained on data generated by a previous version of the model. Credit: M. Boháček & H. Farid/arXiv (CC BY 4.0)" - Nature (ext link)

Unfortunately, or fortunately, researchers have found that feeding AI LLMs AI results leads rapidly to the generation of nonsense.

This is rather important as there's tons of AI data being generated, and unless we find and apply some sort of watermarking or branding of this data as AI generated, results may become unusable or worse unrecognizable as unusable. 

The problem turns out to be especially acute as human generated data is becoming sparcer quantitatively when compared to the availability of AI generated data.

“The message is, we have to be very careful about what ends up in our training data,” says co-author Zakhar Shumaylov, an AI researcher at the University of Cambridge, UK. Otherwise, “things will always, provably, go wrong”. he says.” The team used a mathematical analysis to show that the problem of model collapse is likely to be universal, affecting all sizes of language model that use uncurated data, as well as simple image generators and other types of AI...even before complete collapse, learning from AI-derived texts caused models to forget the information mentioned least frequently in their data sets as their outputs became more homogeneous...(this) is a concern when it comes to making AI models that represent all groups fairly, because low-probability events often relate to marginalized groups, says study co-author Ilia Shumailov, who worked on the project while at the University of Oxford, UK. How much synthetic data is used in training matters. When Shumailov and his team fine-tuned each model on 10% real data, alongside synthetic data, collapse occurred more slowly." - Nature

So, it's time to get a handle on all this before we start seeing the phenomenon in the wild.

"You are what you eat." has meaning in preventive medicine and, it appears in AI, as well.

This was only a brief summary and I didn't talk about the experiments the researchers conducted. It's well worth reading the original article linked above.

Have a great weekend!

 

Photocredit reference: Bohacek, M. & Farid, H. Preprint at arXiv https://doi.org/10.48550/arXiv.2311.12202 (2023).

 

94,164 views 14 replies
Reply #2 Top

I read the article, and this is very concerning. This seems to be similar to AI hallucinations with text, only with images. There are still so many unknowns with AI, and we have pushed it so quickly into so many parts of our lives. These phenomena have the potential to destroy the use of the internet for research. We won't be able to tell fact from fiction (or distorted fact). I agree with you that this needs to be resolved quickly, before we destroy our ability to trust anything on the internet.

+2 Loading…
Reply #3 Top

Quoting pelaird, reply 2

I read the article, and this is very concerning. This seems to be similar to AI hallucinations with text, only with images. There are still so many unknowns with AI, and we have pushed it so quickly into so many parts of our lives. These phenomena have the potential to destroy the use of the internet for research. We won't be able to tell fact from fiction (or distorted fact). I agree with you that this needs to be resolved quickly, before we destroy our ability to trust anything on the internet.

Precisely. The only solution I see as practical is the "watermarking", however it's very clear that one intelligence agency/corporate usage of AI generated data will be poisoning the well from which plans can be made against those entities. We will not be able to tell real data/human generated data from "deep fake" images and "deep fake" data. I think it's plausible to believe this is probably going on already. 

Reply #4 Top

Yup. They've run me through enough until I look pretty much like column 3.

+2 Loading…
Reply #5 Top

We're on the fast track to idiocracy!

+3 Loading…
Reply #6 Top

Quoting pelaird, reply 5

We're on the fast track to idiocracy!

Are you sure we aren't very close to it, already?

+1 Loading…
Reply #7 Top

Interesting, thanks Doc.  For some reason it brings to mind the phrase "echo chamber".

+2 Loading…
Reply #8 Top

Quoting DaveRI, reply 7

Interesting, thanks Doc.  For some reason it brings to mind the phrase "echo chamber".

A pleasure, Dave.

Reply #9 Top

As long as we are talking AI, I am wondering if it is sustainable. In my opinion, AI has become the new corporate status symbol. Large corporations are scrambling to be the biggest and baddest AI provider in the marketplace. With the costs involved to build and maintain AI (including the power requirements), will it ever be profitable? The largest players are struggling to figure out how to monetize this new service. So far their attempts seem to be failing. The big question is how long can they continue to subsidize AI before it becomes profitable?

I'm interested in any opinions others have about this subject.

+1 Loading…
Reply #10 Top

Quoting pelaird, reply 9

As long as we are talking AI, I am wondering if it is sustainable. In my opinion, AI has become the new corporate status symbol. Large corporations are scrambling to be the biggest and baddest AI provider in the marketplace.

I'd only suggest they beware what they wish for, lest they receive it. }:)  

nVidia certainly hasn't lost a penny...and big players aren't going to lose betting on human laziness. The end result will most likely be tragic as privacy will completely disappear, competition for AI augmented search engines will cost current big players (Google) their monopolies and may end up being better. I don't think the "art"/image creation will lose as they are addicting more and more folks who'll pay their price. Adobe? Hard to imagine them losing money, but alternate, high quality software already exists and competition won't make things more expensive...Phone and computer companies will sell AI enabled machines, and folks will buy, especially as AI increases efficiency/productivity.

Hard to predict where things will go without asking AI.  ;)

+1 Loading…
Reply #11 Top

I just saw an article on TechSpot with the subtitle "Who's Profiting from AI (besides Nvidia)?" This should probably include TSMC.

The article is about the projected costs of their Blackwell server cabinets. "Nvidia's GB200 NVL36 server rack system will cost $1.8 million, and the NVL72 will be $3 million."

...wishing I had bought stock in Nvidia a couple of years ago!

 

Reply #12 Top

Quoting pelaird, reply 11

...wishing I had bought stock in Nvidia a couple of years ago!

Me too!

Reply #13 Top

Double plus ungood doublethink.....

+1 Loading…
Reply #14 Top

It makes me wonder if AI will ever truly function without human oversight. The more I use it, the more I see how quickly it can hallucinate or completely lose the plot—especially with each loop of revisions or input after the original request. OpenAI’s image models have improved a lot lately (especially with text), but gibberish still creeps in. I’ve started feeding images back into the AI to ask it what went wrong. Sometimes it catches the problem, but often, it’s not the model’s fault—it’s the limits of the training data or the guardrails shaping what it’s even allowed to represent.

In tools like Adobe Firefly, users get a bit more say. It lets you feed in reference images for style and structure, dial up or down how much they influence the result, and adjust visual intensity. That doesn’t fix the long-term data decay problem, but it does give you a way to steer.

Depending on the task, it sometimes feels like constant course-correcting—doing real-time damage control to keep the model (or 'chucky') from drifting too far into its own echo chamber.