In this blog post Enterprise AI Agents Need Standards Before They Need Scale in 2026, we will look at why interoperability, identity, and observability matter more than agent count. My view is simple: most enterprises do not have an agent scale problem yet. They have a standards problem.
At a high level, enterprise agentic AI is not magic. It is software that uses a model to reason, then takes action through tools, APIs, data, workflows, and sometimes other agents. The moment an agent can read a document, create a ticket, query a system, or hand work to another agent, you are no longer dealing with a chatbot. You are dealing with a distributed system that needs contracts, identity, telemetry, and governance.
After more than 20 years working in enterprise IT as a Solution Architect and Enterprise Architect, I keep seeing the same pattern. Organisations rush to prove that agents can do more work. They spend much less time proving that agents can do work safely, consistently, and in a way the business can actually trust.
The technology behind the shift
The core technology here is not just the model. It is the layer around the model.
In practical terms, an enterprise agent stack has five moving parts. A model makes decisions. An orchestration layer manages prompts, tools, memory, and workflow. A protocol layer connects the agent to tools and data. Another protocol layer lets agents talk to other agents. Then an identity and observability layer makes the whole thing governable.
That is why standards matter so much. Without them, every agent platform becomes its own private universe. Integration gets expensive, security reviews slow down, and operations teams are left trying to debug behaviour that was never designed to be observable.
The four standards I would track closely
MCP for tools and context
In my experience, MCP matters because it gives architects a common way to expose tools, resources, and prompts to model-driven applications. The protocol uses JSON-RPC and defines standard transports for local and remote use. Over the last year it has also become far more enterprise-ready, with OAuth-based authorization guidance, resource indicators, protected resource handling, task-based workflows, conformance testing, and an SDK tiering model published in early 2026. That is exactly the kind of maturity signal I look for before I recommend anything for broad enterprise adoption.
A2A for agent-to-agent collaboration
A2A solves a different problem. It is about how independent agents discover each other, exchange messages, advertise capabilities through Agent Cards, and manage long-running tasks. The protocol is designed around standard web patterns such as HTTP(S), JSON-RPC, and streaming updates, and it explicitly models interrupted states such as auth-required and input-required rather than pretending every workflow is instant. In 2025 the project moved under Linux Foundation governance, and its TCK now gives teams a structured way to test compliance instead of relying on vendor claims alone.
OpenTelemetry for evidence
If MCP and A2A are interaction standards, OpenTelemetry is the evidence standard. It gives you a vendor-neutral way to capture traces, metrics, and logs, correlate them with shared context, and route them through the Collector into whatever operational stack you already trust. What I find especially important for 2026 planning is that OpenTelemetry is not ignoring AI workloads. Generative AI semantic conventions are actively evolving, and there is already dedicated work on MCP semantic conventions, including guidance for tracing MCP methods and propagating context.
OAuth for delegated trust
OAuth is not new, but in agentic systems it becomes newly important. Agents rarely act only as themselves. They act on behalf of users, applications, or downstream services. MCP now explicitly requires OAuth resource indicators and token audience validation, and forbids insecure token passthrough patterns. A2A similarly assumes standard web identity, using OAuth2 and OpenID Connect at the HTTP layer rather than burying identity inside message payloads. That is a healthy direction because it keeps agent security aligned with the rest of enterprise security.
Why agent identity is the first security problem
One pattern I keep running into is teams treating agent identity as a later optimisation. They start with a shared service account, a broad API key, or one platform credential that every workflow reuses. That may get a demo running, but it creates a serious accountability problem almost immediately.
An enterprise agent should be treated more like a workload with delegated authority than like a clever script. I want a unique identity for the agent, a clear record of which user or system initiated the task, audience-bound tokens, short-lived credentials, and a reliable audit trail across every hop. If I cannot answer who asked the agent to act, what authority it used, and what systems it touched, I do not yet have an enterprise pattern. I have an automation liability.
In the Australian context, the thinking behind Essential Eight maps surprisingly well here. The maturity model reinforces phishing-resistant MFA, centrally logged MFA events, dedicated privileged accounts, separation of privileged activity, and Secure Admin Workstations at higher maturity levels. For agent platforms, I apply the same intent: no shared privileged bot identities, no invisible admin paths, and no critical action without strong identity controls and usable logging.
The hidden risk in agentic AI is silent failure
Most leaders worry about the obvious failure modes such as hallucinations or bad answers. In my experience, the more dangerous problem in production is silent failure. The agent sounds confident, the workflow looks complete, but somewhere in the chain a tool timed out, an authorization step stalled, a downstream system returned partial data, or a human approval never arrived.
This is where the standards become operationally useful. A2A formally models interrupted and terminal task states. MCP is moving the same way with task-based workflows. OpenTelemetry also recognises that response status can represent more than clean success or hard failure, including warnings and partial outcomes. If your platform flattens all of that into a green tick, your reporting will be optimistic right up until the first audit or customer incident.
When I review real-world implementations, I look for evidence that the platform can answer very plain business questions. What exactly did the agent attempt? Which identity did it use? Which system declined the request? Was the result partial, delayed, retried, or abandoned? Could a human step in without losing context? Those answers matter more to a CIO than another benchmark about tokens per second.
How I would evaluate agent platforms in 2026
Check for standards support before feature count. I would ask whether the platform supports MCP natively, how mature that support is, whether it aligns with conformance tests, and whether agent-to-agent scenarios use A2A or another openly documented approach. Closed adapters can be useful, but they should not be the foundation.
Inspect the identity model. I want OAuth or OIDC done properly, support for delegated access, clear token scoping, short-lived credentials, and explicit handling of on-behalf-of flows. If the design relies on long-lived shared secrets or broad platform tokens, I assume rework is coming.
Demand observability by default. The platform should emit OpenTelemetry-compatible traces, metrics, and logs, preserve correlation across tool calls and agent hops, and make retries, delays, and state transitions visible. If observability is an add-on, operations will suffer.
Test failure semantics, not just happy paths. I want to see auth-required, input-required, rate-limit, timeout, and partial-result scenarios handled cleanly. Good platforms do not hide uncertainty. They surface it in a way the business can govern.
Review governance in the local regulatory context. For Australian organisations, that means mapping data handling to the Privacy Act and the APPs, thinking early about transparency when automated decisions significantly affect individuals, and building accountable ownership into the operating model. In government settings, the current Australian Government AI policy is already pushing accountable officials, internal use case registers, impact assessments, and stronger AI governance from December 15, 2025, with further requirements rolling into 2026.
As a published author and hands-on architect based in Melbourne, I find that the best enterprise AI conversations are becoming less about model hype and more about control surfaces. That is a good sign. It means the market is finally asking architecture questions again.
I suspect the organisations that get the most value from agents over the next few years will not be the ones that deploy the highest number of them. They will be the ones whose agents can identify themselves, interoperate cleanly, leave a trustworthy trail, and fail loudly enough for humans to respond. That is what scale should mean in enterprise AI.
- OpenAI Agents SDK vs LangGraph in 2026 What CIOs should standardise on
- OpenAI’s $110B Raise and What It Changes in Enterprise AI Roadmaps
- Anthropic’s DoD stance just changed what “safe” enterprise AI means
- What GPT‑5.3 Instant Signals for the Future of Enterprise AI
- OpenAI Hosted on Azure What Microsoft Really Means for Enterprises