Does low quality data cause chatbot performance to decline?

Modern chatbots are constantly learning and their behavior is constantly changing. But their performance may decline as well as improve.

Recent studies undermine the assumption that learning always means improving. This has implications for the future of ChatGPT and its peers. To ensure that chatbots remain functional, artificial intelligence (AI) developers must address emerging data challenges.

ChatGPT gets dumber over time

A recently published study demonstrated that chatbots can become less able to perform certain tasks over time.

To reach this conclusion, the researchers compared the results of the GPT-3.5 and GPT-4 large language models (LLMs) in March and June 2023. In just three months, they observed significant changes in the models that underpin ChatGPT.

For example, in March, GPT-4 was able to identify prime numbers with 97.6% accuracy. By June, its accuracy had dropped to just 2.4%.

Decreased performance of ChatGPT GPT-4 GPT-3.5
GPT-4 (left) and GPT-3.5 (right) responses to the same question in March and June (Source: arXiv)

The experiment also assessed how quickly the models were able to answer sensitive questions, their ability to generate code, and their visual reasoning ability. Across all the skills tested, the team observed instances of AI output quality deteriorating over time.

The challenge of live training data

Machine learning (ML) relies on a training process by which AI models can mimic human intelligence by processing large amounts of information.

For example, the LLMs that power modern chatbots were developed through the availability of massive online repositories. These include datasets compiled from Wikipedia articles, enabling chatbots to learn by digesting the largest body of human knowledge ever created.

But now the likes of ChatGPT have been released into the wild. And developers have far less control over their ever-changing training data.

The problem is that these models can also “learn” to give incorrect answers. If the quality of their training data deteriorates, so does their output. This poses a challenge for dynamic chatbots that are regularly fed content scavenged from the web.

Data poisoning could cause chatbot performance to drop

Because they tend to rely on content pulled from the web, chatbots are particularly prone to a type of manipulation known as data poisoning.

That’s exactly what happened to Microsoft’s Twitter bot Tay in 2016. Less than 24 hours after its launch, ChatGPT’s predecessor began posting inflammatory and offensive tweets. Microsoft developers quickly shelved it and went back to the drawing board.

It turned out that online trolls had spammed the bot all along, manipulating its ability to learn from interactions with the public. After being bombarded with abuse by a 4 channel army, it’s no wonder Tay started repeating their hateful rhetoric.

Like Tay, contemporary chatbots are products of their environment and are vulnerable to similar attacks. Even Wikipedia, which has played such an important role in the development of LLMs, could be used to poison ML training data.

However, intentionally corrupted data is not the only source of misinformation that chatbot developers need to be wary of.

Model Collapse: a ticking time bomb for chatbots?

As AI tools grow in popularity, AI-generated content proliferates. But what happens to LLMs trained on datasets fetched from the web if an increasing proportion of that content is itself created by machine learning?

A recent investigation into the effects of recursion on ML models explored just this question. And the answer he found has major implications for the future of generative AI.

The researchers found that when AI-generated materials are used as training data, ML models begin to forget things they learned previously.

Coining the term “model collapse,” they noted that different AI families all tend to degenerate when exposed to artificially created content.

The team created a feedback loop between an image-generating ML model and its output in an experiment.

After observation, they discovered that after each iteration, the model amplified its own errors and began to forget the human-generated data it started with. After 20 cycles, the output bore little resemblance to the original dataset.

Collapse of the Recursive Machine Learning Outputs Model
Outputs of an image-generating ML model (Source: arXiv

The researchers observed the same tendency to escalate when they played out a similar scenario with an LLM. And with each iteration, errors such as repeated sentences and interrupted speech occurred more frequently.

From there, the study speculates that future generations of ChatGPT may be at risk of model collapse. If AI increasingly generates online content, the performance of chatbots and other generative ML models could degrade.

Reliable content is needed to prevent chatbot performance from dropping

Going forward, trusted content sources will become increasingly important to protect against the degenerative effects of poor quality data. And the companies that control access to the content needed to train ML models hold the keys to innovation.

After all, it’s no coincidence that tech giants with millions of users are some of the biggest names in AI.

Just last week, Meta unveiled the latest version of its LLM Llama 2, Google rolled out new features for Bard, and reports swirled that Apple was also preparing to enter the fray.

Whether driven by data poisoning, early signs of model breakdown, or any other factor, chatbot developers cannot ignore the threat of a performance hit.

Disclaimer

Following the guidelines of the Project Trust, this feature article presents the opinions and views of experts or individuals in the industry. BeInCrypto is dedicated to transparent reporting, but the opinions expressed in this article do not necessarily reflect those of BeInCrypto or its staff. Readers should independently verify the information and seek professional advice before making any decisions based on this content.

Leave a Comment