OpenAI just released GPT-4.5 and says it is its biggest and best chat model yet

OpenAI has just released GPT-4.5, a new version of its flagship large language model. The company claims it is its biggest and best model for all-round chat yet. “It’s really a step forward for us,” says Mia Glaese, a research scientist at OpenAI. Since the releases of its so-called reasoning models o1 and o3, OpenAI…
OpenAI just released GPT-4.5 and says it is its biggest and best chat model yet

Unlike reasoning models such as o1 and o3, which work through answers step by step, normal large language models like GPT-4.5 spit out the first response they come up with. But GPT-4.5 is more general-purpose. Tested on SimpleQA, a kind of general-knowledge quiz developed by OpenAI last year that includes questions on topics from science and technology to TV shows and video games, GPT-4.5 scores 62.5% compared with 38.6% for GPT-4o and 15% for o3-mini.

What’s more, OpenAI claims that GPT-4.5 responds with far fewer made-up answers (known as hallucinations). On the same test, GPT-4.5 made up answers 37.1% of the time, compared with 59.8% for GPT-4o and 80.3% o3-mini.

But SimpleQA is just one benchmark. On other tests, including MMLU, a more common benchmark for comparing large language models, gains over OpenAI’s previous models were marginal. And on standard science and math benchmarks, GPT-4.5 scores worse than o3.

GPT-4.5’s special charm seems to be its conversation. Human testers employed by OpenAI say they preferred GPT-4.5 to GPT-4o for everyday queries, professional queries, and creative tasks, including coming up with poems. (Ryder says it is also great at old-school internet ACSII art.)  

But after years at the top, OpenAI faces a tough crowd. “The focus on emotional intelligence and creativity is cool for niche use cases like writing coaches and brainstorming buddies,” says Waseem Alshikh, cofounder and CTO of Writer, a startup that develops large language models for enterprise customers.

“But GPT-4.5 feels like a shiny new coat of paint on the same old car,” he says. “Throwing more compute and data at a model can make it sound smoother, but it’s not a game-changer.”

“The juice isn’t worth the squeeze when you consider the energy costs and the fact that most users won’t notice the difference in daily use,” he says. “I’d rather see them pivot to efficiency or niche problem-solving than keep supersizing the same recipe.”

Sam Altman has said that GPT-4.5 will be the last release in OpenAI’s classic lineup and that GPT-5 will be a hybrid that combines a general-purpose large language model with a reasoning model.

“GPT-4.5 is OpenAI phoning it in while they cook up something bigger behind closed doors,” says Alshikh. “Until then, this feels like a pit stop.”

And yet OpenAI insists that its supersized approach still has legs. “Personally, I’m very optimistic about finding ways through those bottlenecks and continuing to scale,” says Ryder. “I think there’s something extremely profound and exciting about pattern-matching across all of human knowledge.”