Summary Points
- ChatGPT demonstrated limited accuracy in verifying scientific hypotheses, performing only about 60% better than random guessing, especially struggling to identify false claims.
- The AI lacked consistency, giving contradictory answers to the same question up to 50% of the time, raising concerns about reliability.
- Despite generating convincing language, ChatGPT shows fundamental limitations in reasoning and understanding complex scientific nuances, indicating it does not truly “think.”
- Researchers advise caution when using AI for critical decisions, emphasizing the importance of verification and skepticism due to AI’s current performance and reasoning weaknesses.
AI Sometimes Gets Science Wrong
A new study shows that ChatGPT, a popular artificial intelligence tool, makes mistakes when testing scientific claims. Researchers from Washington State University gave ChatGPT over 700 scientific hypotheses to evaluate. These hypotheses came from recent research papers in business journals. The goal was to see if ChatGPT could tell whether each claim was true or false.
Results Show Limitations
Initially, in 2024, ChatGPT answered correctly about 76.5% of the time. In 2025, its accuracy went up slightly to 80%. However, after adjusting for random guessing, the actual performance looked more modest. The AI was only about 60% better than chance, meaning it often struggled to distinguish true from false.
What stood out most was the system’s difficulty identifying false statements. It correctly labeled them just 16.4% of the time. Also, when asked the same question 10 times, ChatGPT only gave consistent answers about 73% of the time.
Inconsistency Raises Questions
This inconsistency worries researchers. They found that asking the same question repeatedly could lead to different answers. For example, ChatGPT might say “true” once and “false” the next time, even with identical prompts. This shows that the AI’s answers are not always reliable.
Understanding the Limits of AI
The study highlights that ChatGPT produces convincing language, but it doesn’t truly understand the concepts it discusses. Experts say that current AI lacks the “brain” to think like humans. Instead, it memorizes patterns from data and guesses based on that.
According to the researchers, artificial general intelligence that can genuinely reason and think like people might still be far away. While AI tools can be useful, they should not be trusted blindly for important decisions.
Methods and Future Implications
The team used two versions of ChatGPT to perform their tests—one in 2024 and an updated one in 2025. Despite the improvements, both versions showed similar results. The researchers believe these findings show a fundamental challenge with large language models: they can sound convincing but often get facts wrong.
Experts recommend that businesses and consumers verify information from AI systems. It is important to approach AI-generated answers with skepticism and understand what these tools can and cannot do.
Overall, while AI advances quickly, researchers warn that we should be cautious. AI can help, but it’s not yet capable of full understanding or perfect reasoning.
Stay Ahead with the Latest Tech Trends
Explore the future of technology with our detailed insights on Artificial Intelligence.
Stay inspired by the vast knowledge available on Wikipedia.
AITechV1
