In this blog post Why Claude Sonnet 4.6 Is My Default and When Opus 4.6 Wins we will walk through why I’ve shifted my day-to-day AI work toward Sonnet, what changed in 4.6, and the few scenarios where Opus still pays for itself.
I’ve spent the last 20+ years in enterprise IT as a Solution Architect and Enterprise Architect, and one pattern I keep running into is this: the “best” model is rarely the one with the highest benchmark score. It’s the one that produces reliable results at a speed and cost that fits how real teams operate.
Why Claude Sonnet 4.6 Is My Default and When Opus 4.6 Wins isn’t a story about brand loyalty. It’s about picking the right tool for the work, in a way that fits enterprise constraints like security, governance, and the very real cost of engineering time.
A high-level view of what changed with Sonnet 4.6
At a high level, Sonnet 4.6 feels like it moved into the “serious default” category. The quality is close enough to a top-tier reasoning model for most knowledge work, but you keep the responsiveness and efficiency you need when you’re iterating quickly.
In practical terms, it means I can keep Sonnet as my always-on assistant for architecture notes, design reviews, code scaffolding, and document-heavy tasks without constantly second-guessing whether I should switch models to avoid subtle mistakes.
The technology behind it, explained without the hype
Claude Sonnet and Opus are large language models (LLMs). They predict the next best “token” (roughly a chunk of text) based on patterns learned during training. On top of that, modern models are tuned to follow instructions, use tools, and reason through multi-step tasks in a way that’s more structured than older chatbots.
Two technical capabilities matter a lot in enterprise use:
- Long-context understanding: the model can read and keep track of large amounts of material (requirements, policies, code, meeting notes) and still answer coherently.
- Agentic workflows: the model can plan multi-step work, call tools, and keep a task moving across stages (for example: analyze logs, form a hypothesis, propose mitigations, draft a change record).
Sonnet 4.6’s “feel” improvement, in my experience, comes down to a better balance of these abilities. It’s not just smarter. It’s more consistent across the messy, multi-document reality of enterprise environments.
Why Sonnet 4.6 became my new default
1) The speed-to-quality ratio is finally enterprise-friendly
Most leadership-facing outputs don’t need “maximum IQ.” They need clarity, correctness, and a clean structure. Sonnet 4.6 gets me there faster, with fewer prompt gymnastics.
When you’re doing real work—architecture decision records, security exception write-ups, migration plans—the bottleneck is rarely “raw reasoning.” It’s iteration speed and the ability to refine tone, scope, and assumptions without burning half the day.
2) It’s excellent at document-heavy knowledge work
In regulated environments (and in Australia, that’s a lot of environments), the work is often “read three things, reconcile them, and produce one defensible outcome.” Sonnet 4.6 is particularly strong for that pattern.
I use it heavily for summarising, comparing, and drafting around policies and standards—especially where you need plain language for executives but still must stay faithful to the source material.
3) It’s a strong coding companion without feeling like a lab experiment
For developers and platform teams, Sonnet 4.6 is good at the practical coding tasks that actually matter: generating clean scaffolding, refactoring small sections, writing tests, and explaining unfamiliar code.
It also handles the “glue work” well—PowerShell snippets, KQL patterns, Azure CLI tasks, YAML pipelines, and the little bits of code that make platforms operable.
4) Long context changes how you work, not just what you can do
A large context window sounds like a feature for people doing novelty demos. In practice, it changes the operational workflow: you can keep a full design thread, risk decisions, and key constraints in the same conversation and avoid the constant re-briefing.
That matters when you’re juggling architecture trade-offs, Essential Eight constraints, identity design, and delivery realities across multiple teams.
5) The economics are good enough to use it “by default”
Even when nobody is “selling” AI internally, there’s always a budget shadow. If a model is expensive, teams ration it. When teams ration it, they stop building habits and patterns around it.
Sonnet 4.6 is cost-effective enough that I’m comfortable making it the default for high-volume work: drafting, summarising, clarifying, and iterating.
When I still reach for Opus 4.6
Opus 4.6 is still the model I pick when the task is genuinely hard and the cost of a subtle error is high. The difference isn’t “Opus is always better.” It’s that Opus is more dependable when you need deeper reasoning and you want fewer leaps of faith.
1) High-stakes architecture decisions with many interacting constraints
If I’m working through a multi-domain decision—identity, network segmentation, data classification, operational support, vendor constraints, and delivery timelines—Opus is the better thinking partner.
It’s not just about getting an answer. It’s about pressure-testing the decision and surfacing second-order impacts early.
2) Large refactors and complex codebase reasoning
Sonnet is great for “make this function cleaner” or “add tests.” When the task becomes “reshape a service, preserve behaviours, and keep performance stable,” Opus is more reliable.
I particularly notice this when refactoring IaC patterns across environments, or when untangling legacy integrations where the code tells one story and the system behaviour tells another.
3) Security analysis where being conservative is the right posture
In cybersecurity work—threat modelling, control design, or analysing suspicious behaviours—I prefer Opus when I want more thoroughness and a more cautious mindset.
In Australian contexts, that often means mapping controls back to Essential Eight expectations, thinking about identity hardening, and writing remediation plans that are realistic for operations teams.
4) Multi-agent or multi-step workflows that must not drift
When I’m orchestrating a workflow that spans steps—extract requirements, propose options, evaluate risks, draft an ADR, then produce an executive summary—Opus is less likely to “wander” or contradict earlier decisions.
That consistency matters when you’re producing artefacts that will be reviewed by security, risk, architecture governance, and delivery leads.
A real-world scenario where Sonnet vs Opus made a difference
Recently, I worked with an organisation modernising its Microsoft 365 and Azure landing zone practices. The work wasn’t about a single “big bang” migration. It was about getting repeatable governance: identity patterns, conditional access posture, logging, and a pragmatic uplift path that wouldn’t break delivery.
I used Sonnet 4.6 as the default for the heavy lifting: summarising workshop notes, drafting architecture options in plain English, turning rough ideas into structured documents, and producing quick “what changed / what stays / what we need to decide” outputs.
Then I switched to Opus 4.6 for the parts where the interactions were subtle: the trade-offs between identity architecture, device posture, privileged access workflows, and the operational burden of keeping it all running. That’s where deeper reasoning paid off—fewer blind spots, better risk articulation, and stronger defensibility in governance forums.
My practical selection checklist
If you want a simple way to choose, here’s the mental model I use.
- Use Sonnet 4.6 when you need fast, high-quality drafting, summarisation, coding help, and iterative work where you’ll review the output anyway.
- Use Opus 4.6 when the problem is complex, the system has many hidden constraints, or the cost of missing something is high (security, major refactors, high-stakes architecture).
- Switch deliberately when you notice you’re spending more time correcting than progressing. That’s the biggest signal you’ve outgrown the “default” model for that task.
A small technical pattern that improves results with both models
For enterprise work, I’ve found one approach consistently reduces hallucinations and increases usefulness: make the model separate what it knows from what it’s assuming, and force it to ask for missing constraints.
System: You are helping with an enterprise architecture decision.
User: Here is the context (requirements + constraints + current state)...
User: Task: Propose 3 options.
User: Output format:
1) What we know (facts from the input)
2) Assumptions (explicit)
3) Option A/B/C
4) Risks (per option)
5) Decisions needed
6) First 30 days plan
User: If any critical info is missing, ask up to 5 questions before proposing options.
This is not magic. It’s just good architecture practice turned into a prompt. And it makes the model behave more like a careful peer than an overconfident intern.
Closing reflection
My default model choice has become less about “which one is smartest” and more about “which one fits the operating rhythm of real teams.” Sonnet 4.6 fits that rhythm remarkably well, and Opus 4.6 remains the model I trust when the complexity curve steepens.
As these models keep improving, the leadership question shifts from “Should we use AI?” to “Where do we want speed, and where do we demand depth?” In your environment, which decisions are cheap to iterate on—and which ones are too expensive to get even slightly wrong?