Sep 26, 2024 1 min read links

Link: Study: newer, bigger versions of LLMs like OpenAI's GPT, Meta's Llama, and BigScience's BLOOM are more inclined to give wrong answers than to admit ignorance (Nicola Jones/Nature)

A recent study reveals that advanced AI chatbots are more likely to provide incorrect responses than to admit ignorance. Despite improvements, these models still struggle with reliability, often attempting to answer questions beyond their knowledge base.

The research, conducted by José Hernández-Orallo and his team, assessed newer versions of major LLMs (large language models).

They noted an increase in accuracy but also a higher occurrence of errors as the chatbots grew larger and more complex.

Alarmingly, the public often fails to recognize when these AI systems generate false information. An overestimation of the chatbots' capabilities could lead to misleading or harmful decisions.

Hernández-Orallo suggests that AI developers should enhance the performance of chatbots on simpler queries and program them to decline answering when unsure.

This approach could help users better understand the reliability of AI in different situations.

Despite some AI models being programmed to recognize their limits, the pressure to deliver all-purpose chatbots complicates the inclusion of such safe responses.

Vipula Rawte notes that while specialized applications may get more cautious development, general consumer models rarely admit their shortcomings. #

--

Yoooo, this is a quick note on a link that made me go, WTF? Find all past links here.

You might also like...

Link: GPT-5 Thinking in ChatGPT is shockingly good at search and demonstrates the potential of combining tool calling with chain-of-thought reasoning (Simon Willison/Simon Willison's Weblog)

Link: Polar made a subscription-free alternative to Whoop.

Link: Salesforce cut 4,000 customer support roles after deploying AI agents.

Link: A history lesson, courtesy Assassin’s Creed.

Link: OpenAI makes ChatGPT Projects available to all free users, and allows more files to be added to projects; Projects was previously exclusive to paid subscribers (Ian Carlos Campbell/Engadget)