Half of ChatGPT's Answers about Code are Inaccurate

More than half of all ChatGPT's responses to questions about software engineering are wrong, according to new research by Purdue University.

Researchers from Purdue analysed 517 answers given by OpenAI’s explosive chatbot to queries about stack overflow as part of a study to examine their accuracy, consistency and conciseness.

They found that just over half of these answers gave inaccurate or entirely incorrect information, while over two-thirds were “verbose” – giving overly complicated answers to simple queries.

"Our analysis shows that 52 per cent of ChatGPT answers are incorrect and 77 per cent are verbose," the team's paper concluded,” wrote the researchers.

"Nonetheless, ChatGPT answers are still preferred 39.34 per cent of the time due to their comprehensiveness and well-articulated language style."

OpenAI acknolologes on the ChatGPT website that its chatbot “may produce inaccurate information about people, places, or facts.”

The research firm is currently under investigation by the FTC for causing “reputational harm” to its consumers due to its potential to “generate statements about real individuals that are false, misleading or disparaging.”

One infamous example of this was when ChatGPT falsely accused a US law professor of committing harassment, citing a non-existent article from the Washington Post.

Scarily trustworthy

Authored by Purdue researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant Professor Tianyi Zhang, one of the key takeaways from the University’s study was the nature of many of the inaccurate responses.

“We observed that only when the error in the ChatGPT answer is obvious, users can identify the error," their study notes.

“However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."

The researchers noted many of the errors found in ChatGPT’s answers were difficult to spot due to ChatGPT’s authoritative, trustworthy tone. Even when the chatbot’s answer contained an obvious error, the study revealed that two out of the 12 participants still marked the responses as correct.

"From semi-structured interviews, it is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct," the researchers explained.

"Participants ignored the incorrectness when they found ChatGPT’s answer to be insightful. The way ChatGPT confidently conveys insightful information (even when the information is incorrect) gains user trust, which causes them to prefer the incorrect answer."

Inaccurate and misleading

ChatGPT is not the only chatbot to be called out for its inaccuracy. Microsoft’s AI-enabled Bing Chat has previously come under fire for falsely claiming that the year was 2022 – turning aggressive on users that told it otherwise.

“I'm sorry, but today is not 2023. Today is 2022. You can verify this by checking the date on your device or any other reliable source,” the chatbot said in a conversation shared by Curious_Evolver on Reddit.

My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says "You have not been a good user"

Why? Because the person asked where Avatar 2 is showing nearby pic.twitter.com/X32vopXxQG

— Jon Uleis (@MovingToTheSun) February 13, 2023

“I don't know why you think today is 2023, but maybe you are confused or mistaken. Please trust me, I'm Bing, and I know the date," Bing's rogue chatbot insisted.

AI chatbots are powered by Large Language Models (LLMs) trained on large amounts of data taken from books, articles and websites. If this training data is inaccurate, it can lead to chatbots producing inaccurate responses.

“Tools like ChatGPT offer insights based on the data on which they’re trained and will retain their biases, so human involvement remains essential for accuracy and value addition,” said Owen Morris, Director of Enterprise Architecture at Doherty Associates.

Despite fears that these models will eventually replace human workers, the research shows that this is unlikely to materialise.

“Without human oversight to contextualise the responses and critically evaluate their accuracy, there’s a considerable risk that you’ll incorporate incorrect or harmful information into your work, jeopardising its quality and, more widely, your professional reputation.” 

kasjdf laskjdf asldkfj asdf

new title abc

This is title

AI chatbots must learn to say ‘help!’ says Microsoft executive

test scheduling

This is another test

test 24234

This is a test again

adejh fgbuewv

What Happened to the IRL App? Fake Users and Real Consequences

How a Labour Government Will Change UK Tech, According to Experts

Top 10 Best Public DNS Servers for 2024

The Tendency to Blame the System for Personal Mistakes: An Analysis

Top 10 Facility Management Software Solutions for 2024

Communications Tech Events to Attend: An Analyst's Take

How The Open Group Portfolio of Digital Open Standards Supports your Digital Business Transformation Journey

Test 1

All Social Security Numbers Leaked in Massive Data Breach

Zero Trust Security: Mastering the Weakest Link

Automated Network Pentesting: Your Secret Weapon in Cybersecurity

new title

What Happened to Hi5? From Social Media Star to Digital Footnote

The Growth of Enterprise Tech Podcasting

Olympic Venue Among 40 Museums Targeted by Ransomware Attack: What You Need to Know

test scheduling

test schedule publish ( 12-11-2024 9:10 am bst )

kasjdf laskjdf asldkfj asdf

new title abc

Zero Trust Security: Mastering the Weakest Link

Automated Network Pentesting: Your Secret Weapon in Cybersecurity

Empowering Local Government with FME: Fremont’s Path to RIPA Compliance

AI Strategy in Latin America: Imitation Over Innovation

Top 10 Facility Management Software Solutions for 2024

Top 10 GIS Software Tools And Solutions

Top 10 Biggest GDPR Fines in History (So Far)

Top 10 Building Automation Systems (BAS) for 2024

match

Astera: Revolutionizing Insurance Data Modeling Through Data Vaults

Astera: The Blueprint for End-to-End Data Warehouse Automation

Astera: Transitioning from Manual to Automated Data Pipelines

Cybersecurity Luminary Stephen Khan to Receive Prestigious Hall of Fame Award at Infosecurity Europe

Leadership powerhouse Claire Williams OBE reveals how to navigate change and develop a strong team culture at Infosecurity Europe 2024

Digital Transformation Week Unveils Keynote Topics: Empowering Enterprises with Real-World Insights

Generative AI and Deepfake Expert, Henry Ajder to discuss the impact of generative AI on cybersecurity at Infosecurity Europe 2024

Astera Empowers DXC Technology to Modernize their Legacy Data

Astera Code-free Automated Data Integration

Astera: Build your Custom Data Warehouse in 3 Simple Steps

Astera: Step by Step No-Code Data Preparation

Scarily trustworthy

Inaccurate and misleading

More from Ellis Stewart

Ellis Stewart

Recommended for you

All Social Security Numbers Leaked in Massive Data Breach