In this blog post Don’t Buy Black-Box Agents and What Your Agentic AI RFP Needs we will look at how to evaluate agentic AI properly, what the technology actually is, and the exact requirements I would insist on before any enterprise signs a contract.
I have spent more than 20 years working across enterprise architecture, solution design, Azure, Microsoft 365, cybersecurity, and now hands-on AI using platforms such as OpenAI and Claude. One pattern I keep running into is this: the market loves to demo what an agent can do, but it is often far less clear how the agent makes decisions, what systems it can touch, what data it keeps, and how you control it when something goes wrong.
At a high level, agentic AI is not magic. It is a language model wrapped in workflow logic. The model receives a goal, pulls context from data sources, calls tools, may pass work to another specialised agent, and then returns a result or takes an action. That sounds simple, but in practice it introduces architecture, security, privacy, and governance questions that do not exist in a normal chatbot proof of concept.
That is why I would not buy a black-box agent. If a supplier cannot explain the agent’s tools, approval points, memory, logging, and security boundaries, you are not buying automation. You are buying uncertainty.
What agentic AI really consists of
For non-technical leaders, the easiest way to think about an agent is as a digital worker with four moving parts. First, there is the model that understands language and reasons over the task. Second, there are tools and connectors that let it search, retrieve files, query systems, write records, send messages, or trigger workflows.
Third, there is orchestration. This is the logic that decides when the agent should ask a question, call a tool, hand work to another step, or stop. Fourth, there are controls around it all: identity, permissions, guardrails, human approvals, trace logs, and evaluation.
That last layer is where many RFPs are still weak. The platforms are moving quickly toward more capable tool use, standardised connectors, tracing, evaluations, and stronger controls. That is good news, but it also means your procurement language needs to be sharper than it was even a year ago.
Why black-box agents are a procurement problem
In my experience, a black-box agent usually shows up in one of three ways. The vendor will not disclose the underlying model strategy, will not show how tool calls are governed, or will not provide meaningful observability into what happened during a run.
That may be acceptable for a consumer app. It is not acceptable for a finance workflow, an HR process, a service desk action, or a decision support use case touching regulated data.
For Australian organisations, the pressure is even clearer. Privacy obligations do not disappear because a supplier labels something AI. Between the Privacy Act, OAIC guidance, ACSC and ASD security advice, Essential Eight expectations, and the broader push for safer AI governance, leaders need evidence that an agent can be inspected, constrained, and audited.
What I would put in the RFP
1. Architecture transparency
Start with the most basic question: what are we actually buying? I would require a plain-English architecture description and a technical architecture diagram.
- The supplier must describe the model or models used, including how model changes are managed.
- The supplier must identify every external tool, connector, API, and retrieval source the agent can access.
- The supplier must explain whether the design is single-agent, workflow-based, or multi-agent.
- The supplier must state where memory, conversation state, and retrieved data are stored.
- The supplier should support open integration patterns where practical, so the organisation is not trapped inside a proprietary orchestration layer.
I want to know where the reasoning ends and where deterministic system logic begins. If a vendor cannot explain that boundary, support teams will struggle later, and so will your risk team.
2. Observability and traceability
This is the line item I see missed most often. If an agent makes a poor recommendation, triggers the wrong action, or fails halfway through a process, your team needs a trace.
- The solution must provide end-to-end execution traces for each run.
- Traces should show prompts, retrieved context, tool calls, approvals, outputs, and failure points.
- Logs must be exportable into the organisation’s monitoring and SIEM platforms.
- The supplier must provide evaluation methods, test datasets, regression testing, and a repeatable process for measuring agent quality over time.
In practical terms, I do not want a vendor saying, “the model decided that.” I want them showing what the agent saw, what it called, what policy applied, and what would be changed to prevent the issue happening again.
3. Human approval and policy enforcement
Not every task should be fully autonomous. In fact, many should not be.
- High-risk actions must support human approval before execution.
- The platform must separate read access from write access.
- Role-based access control should integrate with enterprise identity, ideally using the organisation’s existing directory and conditional access controls.
- The organisation must be able to define allow lists, deny lists, spending thresholds, and escalation rules.
I have seen leaders get excited by the phrase autonomous agent, then spend the next six months rebuilding manual controls around it. It is usually better to start with supervised autonomy and expand only where the workflow proves itself.
4. Data governance, privacy, and residency
This section matters even more in Australia than many vendors assume. If personal information, employee content, customer records, or commercially sensitive documents are involved, your RFP needs to be explicit.
- The supplier must state what customer data is stored, for how long, and in which jurisdictions.
- The supplier must state whether customer data is used for training, tuning, or service improvement, and what controls exist to disable that use.
- The solution should support customer-managed retention, deletion, and audit requirements.
- The supplier must describe tenant isolation, encryption, key management, and backup controls.
- The supplier must support privacy reviews and provide input suitable for a privacy impact assessment.
One practical question I always ask is this: if the agent handled sensitive material yesterday, can we prove where that data went today? If the answer is vague, the risk is real.
5. Security and supply chain controls
Agentic AI expands your attack surface. It is not only about model risk. It is also about connector risk, identity risk, plugin risk, prompt injection, poisoned documents, and over-privileged automation.
- The supplier must explain how the solution defends against prompt injection and malicious tool instructions.
- The platform must protect secrets, tokens, and service credentials using enterprise-grade controls.
- Code execution, browser automation, or agent-driven transactions must run in isolated environments with clear egress rules.
- The supplier should provide software supply chain information, vulnerability management practices, and independent assurance artefacts where available.
- The organisation must be able to align deployment with its existing cyber controls, including Essential Eight priorities where relevant.
In other words, do not treat the agent like a smart interface. Treat it like a privileged runtime with access to business systems.
6. Commercial and operating model clarity
Finally, I want the RFP to cover what happens after go-live. Agents are not static products. Models change, tools change, costs change, and behaviour can drift.
- The supplier must provide notice periods for major model or platform changes.
- The organisation must be able to test updates before production rollout.
- The supplier must provide clear rollback options and version history.
- The pricing model should be understandable enough that finance and operations teams can forecast usage.
- The customer must be able to export prompts, configurations, traces, and evaluation artefacts if the service is replaced.
Lock-in is not only about data. In the agent era, lock-in can also mean losing the workflow logic, test history, and operational knowledge you built around the system.
A practical RFP section you can adapt
If I were drafting a starting point, it would look something like this.
The proposed agentic AI solution must provide transparent architecture documentation, end-to-end execution tracing, configurable human approval points, enterprise identity integration, data residency and retention controls, audit logging, and documented protections against prompt injection and tool misuse.
The supplier must disclose all models, tools, connectors, and external dependencies used by the solution, including how updates are managed and how customers are notified of material changes.
The customer must retain the ability to inspect, govern, test, and export the agent configuration, operational logs, and evaluation results throughout the contract term.
The takeaway
As a Melbourne-based architect working with organisations across Australia and internationally, I find the most useful AI conversations are no longer about whether agents are real. They are. The better question is whether the agent can be governed like any other serious enterprise capability.
The winners in this space will not just be the vendors with the best demos. They will be the ones prepared to show their wiring, accept scrutiny, and operate inside real enterprise controls. My view is simple: if an agent is going to touch your systems, your people, or your decisions, it should never be a black box.
Over the next few years, I expect the strongest organisations will treat agentic AI less like a novelty and more like a new architecture discipline. That shift will make procurement much more demanding, but it will also make the outcomes far more trustworthy.
- The Hidden Risk in Enterprise Agentic AI and Silent Failures
- How to Evaluate Agent Platforms in 2026 with Identity First in Mind
- MCP A2A OpenTelemetry and OAuth Every Architect Must Track in 2026
- Enterprise AI Agents Need Standards Before They Need Scale in 2026
- Anthropic’s DoD stance just changed what “safe” enterprise AI means