Fast Facts
-
Unlike stepwise reasoning models, GPT-4.5 quickly generates responses, excelling in general-purpose tasks with a 62.5% accuracy on the SimpleQA quiz, significantly outperforming previous versions like GPT-4o and o3-mini.
-
GPT-4.5 exhibits fewer inaccuracies, or "hallucinations," producing misleading answers only 37.1% of the time compared to 59.8% for GPT-4o and 80.3% for o3-mini.
-
Although GPT-4.5 showcases improved conversational skills, scoring better in user preference tests, it underperforms on standard science and math benchmarks compared to o3-mini.
- Criticisms arise that while GPT-4.5 appears polished, it may not represent a substantial innovation, with experts suggesting a shift toward efficiency or specialized problem-solving rather than mere enhancements to existing models.
OpenAI Unveils GPT-4.5: The Best Chat Model Yet?
OpenAI has officially launched its latest language model, GPT-4.5. The company claims this version is its most advanced chat model to date. According to OpenAI, GPT-4.5 stands out for its general-purpose capabilities. It offers a more sophisticated conversational experience compared to previous models.
In tests using SimpleQA, a general-knowledge quiz, GPT-4.5 achieved a score of 62.5%. In contrast, GPT-4o scored 38.6%, and o3-mini only reached 15%. This marked improvement indicates that GPT-4.5 can handle a range of topics, from science to pop culture. Additionally, OpenAI states that GPT-4.5 produces fewer incorrect answers, known as hallucinations. For example, it generated inaccurate responses 37.1% of the time, while GPT-4o had a rate of 59.8%, and o3-mini reached 80.3%.
However, SimpleQA is just one benchmark. On other tests, including the Multi-Task Language Understanding (MMLU), GPT-4.5 outperformed its predecessors but by a narrower margin. Notably, in standard science and math assessments, GPT-4.5 fell short compared to o3-mini.
One of GPT-4.5’s strengths lies in its conversational skills. Human testers reported a positive experience, favoring GPT-4.5 over GPT-4o for casual inquiries and creative tasks, such as poetry and ASCII art. For instance, if a user mentions feeling down, GPT-4.5 might say, "Want to talk about what happened, or do you just need a distraction? I’m here either way." This response contrasts sharply with GPT-4o, which might respond with a list of uplifting suggestions rather than empathy.
Despite these advancements, OpenAI faces skepticism. Waseem Alshikh, cofounder and CTO of Writer, expressed his views on the model’s impact. He believes the emphasis on emotional intelligence is valuable for specific applications, like writing aids. However, he views GPT-4.5 as merely an incremental update. He noted, "The juice isn’t worth the squeeze when you consider the energy costs and the fact that most users won’t notice the difference in daily use.”
Alshikh suggested that a focus on efficiency or specialized problem-solving would be a more beneficial direction for future developments. Thus, while GPT-4.5 marks a significant step in AI evolution, its real-world impact remains to be seen.
Expand Your Tech Knowledge
Stay informed on the revolutionary breakthroughs in Quantum Computing research.
Explore past and present digital transformations on the Internet Archive.
SciV1