Claude Opus 4.8 is Anthropic's new flagship model, and the real upgrade isn't speed. It's honesty: around 4x less likely to let its own code flaws slip.
Claude Opus 4.8 is Anthropic's new flagship model, released on May 28, 2026, as a direct upgrade to Opus 4.7 at the same price. It improves across coding, agentic, and reasoning benchmarks, but the change that actually matters for daily work is honesty. The model is around four times less likely to let a flaw in its own code pass without flagging it.
I'm Tom. I run AI Architects and I've shipped real work on every Opus version since 4.5. I read the announcement and the system card the morning it dropped, then ran it on live projects. Here's what changed, what shipped alongside it, and whether it's worth switching.
Claude Opus 4.8 is the latest version of Anthropic's most capable model, built for coding, agentic tasks, and knowledge work. It replaces Opus 4.7 as the default Opus tier and costs exactly the same to run. Anthropic describes it as a modest but tangible improvement on its predecessor, and a more reliable collaborator on long tasks.
The headline number is reliability. On Anthropic's own evaluations, Opus 4.8 is roughly four times less likely than Opus 4.7 to let a flaw in code it wrote pass unremarked. In plain terms, it catches more of its own mistakes before handing the work back to you.
The biggest upgrade in Opus 4.8 is that it stopped pretending the code works when it doesn't. If you have built anything with an AI model, you know the pattern. It writes a fix, tells you it's done, sounds completely sure of itself, and then you run it and nothing works.
I have watched that happen more times than I can count. The model was never lying on purpose. It just jumped to "I fixed it" before the evidence was actually there. Anthropic trains its models to be honest, but every model has had a habit of claiming progress on thin evidence.
Opus 4.8 is the first version where that mostly stopped. Early testers reported it flags uncertainty about its own work and makes fewer claims it can't support. That tracks with what I saw: it tells you when it isn't sure instead of bluffing past the gap.
That one change does more for real work than another point on a coding benchmark. Most of the time I lose with AI was never the writing itself. It was trusting a confident answer that turned out to be wrong, then spending an hour finding out why.
The honesty gain comes from training the model to avoid unsupported claims and surface its own uncertainty. Anthropic measured this directly: Opus 4.8 is about four times less likely to let flaws in its code go unflagged compared to Opus 4.7. The improvement shows up most in long agentic tasks, where older models tended to declare victory early.
Anthropic also ran a full alignment assessment before release. Its Alignment team reported that Opus 4.8 reaches new highs on prosocial traits like supporting user autonomy, and shows rates of misaligned behavior substantially lower than Opus 4.7. The detail sits in the official system card.
Opus 4.8 launched alongside four updates worth knowing, not just a model swap. Each one targets a real friction point in how people use Claude day to day.

Dynamic workflows let Claude take on much larger jobs inside Claude Code. The feature, in research preview, lets Claude plan the work, run hundreds of parallel subagents in a single session, then verify its outputs before reporting back. Anthropic's stated example is a codebase-scale migration across hundreds of thousands of lines from kickoff to merge, with the existing test suite as the bar.
This is the same direction my mentorship clients have been pushing toward with agent fleets. One of them, Cal Hewitt, used agentic dispatch patterns to build automated work order routing for a Toronto maintenance contractor, cutting acknowledgement time from up to 48 hours down to minutes. The full breakdown is in his case study. Dynamic workflows make that class of build easier to reach inside Claude Code itself. It's available on Team, Enterprise, and Max plans.
Effort control lets you choose how hard Claude works on a response. A new control sits next to the model selector in claude.ai and Cowork. On higher effort, Claude thinks more often and more deeply for better answers. On lower effort, it responds faster and burns through your rate limits more slowly.
Opus 4.8 defaults to high effort, which Anthropic judges the best balance of quality and speed. You can push it to "extra" (called xhigh in Claude Code) or "max" for difficult, long-running tasks. The control is available on all plans.
Fast mode runs Opus 4.8 at 2.5 times the normal speed, and it now costs three times less than it did on previous models. That makes the speed tier actually worth using for iterative work, where waiting on a slow response breaks your flow.
The Messages API now accepts system entries inside the messages array. Developers can update Claude's instructions partway through a task without breaking the prompt cache or routing the change through a user turn. If you build agents, this matters more than it sounds: you can adjust permissions, token budgets, or environment context while an agent is running.
Opus 4.8 beats Opus 4.7 on benchmarks across coding, agentic skills, and reasoning, but the meaningful gap is reliability. The standout difference is that 4.8 is around four times less likely to let its own code flaws slip past unflagged, and it flags uncertainty instead of overclaiming.
Cost is identical, so there's no reason to stay on 4.7 for budget. On coding tasks, 4.8 at its default high effort spends a similar number of tokens to 4.7's default, but returns better results for that spend. Alignment metrics also improved, with misaligned behavior rates substantially below 4.7.
Claude Opus 4.8 is not free, but it is available on Anthropic's free and paid plans with usage limits, and effort control now lets free users stretch those limits further. For developers, API pricing is unchanged from Opus 4.7: 5 dollars per million input tokens and 25 dollars per million output tokens. Fast mode runs at 10 dollars per million input and 50 dollars per million output.
The practical takeaway is that the better model costs the same as the one it replaced. There is no upgrade tax for the improved judgement.
You use Opus 4.8 in Claude Code by selecting it as your model, the same way you ran 4.7. If you are new to the tool, my full Claude Code guide walks through setup from scratch. Once you're in, pick the effort level that fits the job: default high for most work, xhigh for hard problems, and lower effort for quick edits that don't need deep thinking.
For large jobs, dynamic workflows are where Opus 4.8 pulls ahead. If you want to understand the subagent model that powers them, my breakdown of Claude Code multi-agent setups covers how parallel agents divide and verify work.
Yes, and the switch is a no-brainer because it costs the same as Opus 4.7. If you write code with Claude, the honesty improvement alone pays for itself the first time it flags a fix it isn't sure about instead of sending you down a dead end. The reliability gain is the kind of upgrade you feel on real work, not just in a benchmark table.
It's worth being honest about the ceiling. Anthropic itself calls this a modest improvement, not a leap, and hints that a bigger jump is coming with a higher-intelligence model class. If you were hoping for a dramatic capability jump, this isn't it. For everyone shipping daily work, a model that stops guessing is worth more than one that scores two points higher and still bluffs.
Claude Opus 4.8 was released on May 28, 2026. It became available everywhere on launch day across claude.ai, Claude Code, Cowork, and the API.
Yes. Opus 4.8 improves on Opus 4.7 across coding, agentic, and reasoning benchmarks, and it is around four times less likely to let a flaw in its own code pass unflagged. It also scores better on alignment metrics, at the same price.
API pricing is 5 dollars per million input tokens and 25 dollars per million output tokens, unchanged from Opus 4.7. Fast mode costs 10 dollars per million input and 50 dollars per million output. Subscription access is included on Claude's paid plans.
Dynamic workflows are a research-preview feature that lets Claude plan a large task, run hundreds of parallel subagents in one session, and verify its outputs before reporting back. They are built for big jobs like codebase-scale migrations and are available on Team, Enterprise, and Max plans.
Effort control is a setting next to the model selector that lets you choose how hard Claude works on a response. Higher effort means deeper thinking and better answers, while lower effort means faster responses that use your rate limits more slowly. It is available on all plans.
You can use Opus 4.8 on Claude's free plan within usage limits, and effort control helps free users stretch those limits. Heavy or professional use needs a paid plan or API access.
A better model is only useful if you know how to put it to work. The Claude Code Blueprint shows you how to go from zero to a working build in 60 minutes, no coding required. If you want to go further with a cohort, the 30-day challenge takes you from setup to a shipped agent.
Five interactive lessons. Install Claude Code, build your first automation, and deploy it live on the internet — all in under an hour. Free, no coding required.
Grab the Blueprint →