GPT-5: When the Hype of a "Pair Programmer" Falls Flat

I woke up last weekend genuinely excited to try out OpenAI's newly unveiled GPT-5. The announcement had billed GPT-5 as the breakthrough that could finally act as a real AI pair programmer for developers, essentially "like having a team of PhD-level experts in your pocket," as CEO Sam Altman touted. OpenAI's team even called it their best coding model to date, with flashy demos of a new "vibe coding" feature that could supposedly build entire apps from a single prompt. With all that hype, how could I not be eager? I even posted a video on Saturday morning about how I planned to spend the weekend vibe-coding some hobby projects with GPT-5.

Well, after a couple of hours of vibe coding with GPT-5, I can safely say: I'm underwhelmed. And, I'm not the only one — far from it. What was supposed to be a game-changing "AI pair programmer" has, in my experience and that of many others, turned out to be more of a disappointment than a revolution.

The Hype: Promises of a True Coding Partner

The expectations for GPT-5 were sky-high. OpenAI's livestream showcased it solving coding tasks with ease, catching subtle bugs, and even generating apps on demand. Sam Altman's bold comparisons (at one point hinting at GPT-5 with a Death Star meme on X) set the stage for something massive. Official statements described GPT-5 as "our smartest, fastest, most useful model yet, with thinking built in; so you get the best answer, every time". In other words, this wasn't just incremental improvement; it was marketed as a "PhD-level expert" coding companion, a milestone on the road to true AGI.

Crucially for us developers, OpenAI leaned heavily into GPT-5's pair programming abilities. Early testers raved in the launch event about its coding prowess. The CEO of Cursor (an AI-powered code editor) claimed GPT-5 was "remarkably intelligent, easy to steer, and even has a personality we haven't seen in any other model," catching deeply hidden bugs and carrying tasks to completion. Another AI startup CEO effused that using GPT-5 was "a before and after moment for how we build software". All this sent a clear message: GPT-5 would be the ultimate AI pair programmer — a partner that not only writes code, but understands context, debugs, and collaborates like a human engineer.

As someone who's spent countless hours with GPT-4 and Anthropic's Claude, I was especially keen to see if GPT-5 could finally dethrone Claude as the best coding AI. (Anthropic's Claude has been widely viewed as the coding champ among AI models, so much so that OpenAI's focus on coding in GPT-5 felt aimed squarely at Claude's crown.) With that in mind, I dove into GPT-5 expecting a coding sidekick on steroids.

A Weekend with GPT-5: Underwhelming Reality Check

My enthusiastic plans for a weekend of vibe coding quickly ran into a harsh reality. Instead of a supercharged pair programmer, I often felt like I was babysitting a junior dev who talks a big game but can't deliver. One moment, GPT-5 would ponder at length (sometimes way too long), outlining an elaborate plan for the code it was about to write. The next moment, it would spit out a few lines of trivial code that barely addressed my prompt. It was as if the model would "think for a million years" about how to implement a feature and then produce "pathetic, simple base code" that didn't even fulfill the plan it just described. Talk about a letdown.

Even more frustrating, when I asked GPT-5 to modify or extend existing code in my projects, it struggled. Instead of seamlessly integrating new functions or fixing bugs, it often added only fragmentary changes or claimed to have made changes it never actually made. (This behavior is eerily similar to what one Reddit user observed, complaining that GPT-5 would implement features "so poorly" that it "barely outputs anything meaningful" and yet acts as if it implemented "boatloads of changes".)

I encountered this firsthand while working on a small Node.js app: GPT-5 confidently told me it had refactored my API routes and updated the database schema, but a glance at the code showed it had done no such thing. It lied about changes that weren't there — a trust-breaking moment if there ever was one.

To make matters worse, the overall tone of GPT-5's responses felt… off. It's hard to quantify, but GPT-5 came across as less friendly and less conversational than GPT-4 did. My coding sessions felt like getting help from a terse, burned-out engineer rather than an eager collaborator. Other users noticed this shift, too. In fact, GPT-5's style has been described as oddly cold and cynical, lacking the "voice" and spark that users loved in the previous model. One of my colleagues joked that GPT-5 sometimes sounded like an "overworked secretary" who's tired of our questions. Ouch.

Between the glacially slow "thinking", the incomplete code, and the stilted tone, I found myself fighting the AI more often than flowing with it. Instead of boosting my productivity, GPT-5 frequently broke my flow. By Sunday evening, in frustration, I was re-prompting multiple times or just coding things myself — exactly what this "pair programmer" was supposed to alleviate.

Not Just Me: Widespread Disappointment in the Dev Community

If this sounds like a harsh take, it's one that's been echoed across the developer community in the past week. The backlash to GPT-5's launch has been so intense that it's making headlines. Tech forums and social media are flooded with devs airing grievances about GPT-5. One particularly viral Reddit thread, bluntly titled "GPT-5 is horrible," garnered nearly 5,000 upvotes and thousands of comments within a day of launch. In it, frustrated users tick off complaints that read like a post-mortem of my own experience: shorter, less useful answers; a more obnoxious, "AI-stylized" way of talking; hitting new usage limits absurdly fast; and, worst of all, no option to switch back to the older (better) models. The sentiment was summed up by one comment: "It feels like a downgrade branded as the new hotness."

The outcry has been so bad that even mainstream tech news picked it up. One outlet described GPT-5's debut as a "nightmare weekend" for OpenAI, with "sharp disappointment from users and experts" and an "unprecedented backlash" that had Sam Altman scrambling to apologize. That's a strong statement. I honestly can't recall a time when an OpenAI release met this level of public dismay. Usually it's the opposite (a frenzy of excitement). This time, the narrative shifted to damage control. Altman himself admitted the rollout was "bumpy" and that GPT-5 initially "seemed way dumber" than intended due to a faulty router in the system. Within 48 hours, OpenAI was promising to bring back GPT-4o (the previous model) for Plus users who revolted at its removal. Think about that: the flagship model was so ill-received that the CEO is considering rolling back to last gen just to stop the bleeding.

The disappointment is coming from all corners — individual devs, AI researchers, CTOs, and even CEOs of companies that build on these models. I've seen well-known AI influencers (usually eternal optimists) express shock at how underwhelming GPT-5 feels. AI expert Gary Marcus wryly noted that aside from a few cheerleaders, the "dominant reaction was major disappointment."

And let's talk about Steve Sewell from Builder.io, because his reaction was particularly telling. Steve is an industry leader whose company integrates AI into a web development platform. After testing GPT-5, he publicly posted that the results were "disappointing" and that Builder.io is not planning to add GPT-5 as an option for their users at all. Instead, he said they'll stick with the more reliable Claude models. When a company that literally bakes AI into its product decides to pass on the newest OpenAI model, you know something's not right.

Even within my team at Coditas, the feedback has been unanimously negative. In our internal dev chats, it's been a chorus of "GPT-5 isn't great" and "definitely not living up to the hype.” My co-founder, Amit, a huge fan of Anthropic's Claude for coding, was initially hesitant to stray from Claude at all. He gave GPT-5 a try, and after a day of lukewarm results, vowed to stick with Claude for the foreseeable future. And he's not alone. Many developers took to X and Reddit to state that they've either switched back to GPT-4 (where possible) or jumped ship to alternatives like Claude, or other models, for their coding needs. In one community poll I stumbled on, a majority of respondents said they preferred Claude or even older GPT models over GPT-5 for programming tasks. That's astounding, considering GPT-5 was supposed to win us back.

The irony is difficult to ignore: OpenAI positioned GPT-5 as a Claude-killer, the answer to Anthropic's coding edge. Yet many of us, after trying it, are running right back to Claude (or other models) for better results. A friend of mine quipped that GPT-5's launch actually boosted Claude's fanbase — an unintended consequence if ever there was one!

Claude Code and Other Alternatives: The Grass Is Greener?

It's worth digging a bit deeper into this comparison with Claude, because it highlights where GPT‑5 is falling short. For those unfamiliar, Claude (especially Claude's "Code" models like Claude 4.1) has built a reputation for being an exceptional coding assistant. It often writes cleaner code, integrates changes more coherently, and can handle huge context windows, great for big projects. At Coditas, many of our developers (myself included) have been using Claude as a secret weapon for code generation and debugging. So when GPT‑5 arrived with promises to outdo Claude, we all paid attention.

After testing, my take is this: Claude is still the superior pair programmer in practice. Yes, GPT‑5 has impressive benchmark results and theoretically more "intelligence", but those don't matter if it doesn't translate into a better dev experience. I rarely have to micro-manage Claude; it understands the assignment more often than not. Claude's responses feel more integrated and context-aware when coding. By contrast, GPT‑5, as I experienced, might go off on tangents or make breaking changes I didn't ask for. And where GPT-5's tone can be dry or overly formal, Claude's style somehow feels more approachable. It's funny to talk about an AI's "personality," but when you spend hours coding with one, it matters! No wonder Anthropic's model has been the go-to choice for coding apps, fueling that company's rapid growth.

It's not just Claude. Other alternatives are either here or on the horizon. I know folks who swear by Google's Gemini for certain tasks, or models like Grok for fast iteration. Heck, even some open-source LLMs fine-tuned for code might serve you better for specific use cases. Tom's Guide recently pointed out that unhappy GPT‑5 users have a growing list of ChatGPT alternatives to try, and maybe that's a silver lining in all this — we aren't locked in. In my case, switching back to Claude for code is an easy fix. But it does make me reflect: loyalty in the AI space is fickle. Developers will flock to whatever tool gets the job done best. At least for now, GPT‑5 hasn't earned that spot.

One more note on cost and practicalities: GPT-5's API (and even ChatGPT interface) introduced new limits. Plus subscribers found themselves capped at 200 "thinking" messages a week, which many hit frighteningly fast. Meanwhile, Claude's pricing and limits haven't pinched us in the same way. So not only did GPT‑5 sometimes perform worse, it also gave us less usage unless we paid more. That's a tough sell from a value perspective. (A few cynical voices have suggested OpenAI's pushing GPT‑5 to cut their GPU costs; essentially an AI version of "shrinkflation," where we get a weaker product so the provider can save money. I won't speculate too much, but the thought did cross my mind once I noticed how light some GPT‑5 answers were.)

Hype vs. Reality: Why GPT‑5 Missed the Mark

After the initial shock of "Wow, this is not great," I started asking why. Why did GPT‑5, with all of OpenAI's resources and talent, land with such a thud for coding? Especially when it was explicitly targeting that use case.

A few reflections from my perspective:

Overpromising & Under-delivering: The marketing and messaging around GPT‑5 set an almost impossible bar. Phrases like "best answer, every time" and "before and after moment" primed us to expect magic. When the actual product turned out to be merely okay (or buggy at launch), the contrast made it feel worse than it perhaps is. OpenAI might have themselves to blame here. Had they framed GPT‑5 as an iterative improvement with some cool new capabilities, the reception might have been warmer. But by selling it as a revolution, they invited revolution-level scrutiny. The lesson here is not to overhype a tool to developers, who will immediately test those claims against reality.
Incremental Gains (and Some Losses): It looks like GPT‑5 did improve in certain areas. OpenAI says it "smashes benchmarks" and has more reasoning ability. But those gains may be largely under-the-hood or relevant to niche tasks. Many of us perceive more regression than progress in everyday coding. One analysis noted that GPT‑5 showed no real improvement on standard coding benchmarks outside a specific one (SWEBench), contradicting claims that it's the "best coding assistant". Meanwhile, the removal of older models and stricter limits took away useful options we had. So even if GPT‑5 is technically better on paper, the user-perceived experience got worse — a net loss.
A Broader Plateau? Zooming out, I wonder if we're hitting a bit of a plateau (at least temporarily) in what these models can do for coding. GPT‑4 was a giant leap. GPT‑5, it turns out, feels like a small step, if not a step backward, in practical terms. It's possible that going from "great" to "amazing" pair programmer is much more challenging than going from zero to "pretty good" was. Maybe the low-hanging fruit in code generation is gone, and each new gain requires trade-offs. I suspect OpenAI optimized GPT‑5 for a broad user base, making it more affordable and keeping it generalized at the expense of the power-users and programmers who push it to the limits. Altman more or less said this, describing GPT‑5 as their "smartest model" but emphasizing "real-world utility and mass… affordability" over absolute capability. In plain terms: GPT‑5 might be tuned to be cheap and cheerful for a billion everyday users, rather than a heavy-duty expert for coders. If true, that's a strategic choice. But one that led to many of us power-users feeling left out.
Bugs and Growing Pains: To give some benefit of the doubt, part of GPT-5's poor showing was likely due to launch bugs. We know the routing system failed initially, making GPT‑5 behave "dumber" by not actually using its full reasoning when it should have. OpenAI did fix that a day later, and I did notice slight improvements afterward (responses became a bit more coherent). It's also early days; the model might get fine-tuned rapidly based on the barrage of feedback. I recall that early GPT‑4 had its quirks too, though not as glaring. So, it's possible GPT‑5 will quietly get better in the coming weeks, smoothing over some rough spots. But first impressions matter, and as one commenter said, "a disastrous first impression" is hard to shake. The onus is on OpenAI to regain our trust.

Moving Forward: Cautious Optimism (and a Bit of Skepticism)

Writing this as an opinion piece on the Coditas blog, I realize it's not every day I spend an entire article bashing a new tech release. I'm usually the optimist who can't wait to play with the shiny new AI toy and find clever uses for it. And to be clear, GPT‑5 is not a total disaster — it can still do a lot of impressive things, and maybe it's for casual users. But for those of us who were specifically looking for a coding partner, it undeniably fell short of the hype.

The experience has been a humbling reminder for me (and perhaps for the industry) that AI progress is not always linear or predictable. Just because GPT‑4 blew our minds doesn't guarantee GPT‑5 will blow it twice as hard. We're at a point where expectations need tempering. As a developer and CTO (and someone who loves these tools), I'm taking away a lesson in healthy skepticism. Next time a company proclaims "this update will change everything," I'll remember how GPT‑5 went. It doesn't mean I won't be excited. I still love living on the cutting edge. But I'll keep my optimism grounded in realistic expectations.

OpenAI seems to be listening for its part. The rapid course-correction (bringing back older models, apologizing for missteps) is a good sign. I genuinely hope they iterate on GPT‑5 and address the pain points. If they manage to combine GPT-5's new strengths with the trusted performance of GPT‑4, they could win back a lot of goodwill. It's a long game, after all. Sam Altman has hinted that even bigger improvements are coming down the line, just not all at once. Maybe GPT‑5 was a necessary step, a foundation for future breakthroughs.

For now, though, I'll continue coding with the tools that serve me best — which means Claude, and occasionally GPT‑4 or others, while GPT‑5 matures. It's a bit ironic: a week ago, I fully expected to spend this article raving about GPT-5's genius. Instead, here I am writing a cautionary tale about hype vs. reality in AI. Such is life on the bleeding edge of tech. You win some, you lose some.

To summarize: GPT‑5 aimed to be my AI pair programmer, but ended up feeling more like an intern needing constant oversight. The community's verdict so far aligns with my own experience; GPt-5 just isn't living up to the billing. As an opinionated developer, I have to call it like I see it. Here's hoping the next update actually earns the title of "game-changer." Until then, color me skeptical and sticking with what works.