Insights

AI COE - From experimentation to enterprise value

Designing an AI center of excellence for the enterprise

Adnan Masood, PhD. Chief AI Architect, UST

From AI enthusiasm to operating leverage

As enterprises deploy copilots, agents, and multi‑model workflows, success depends less on experimentation and more on governance, economics, and execution. A well‑designed AI Center of Excellence is how leaders scale AI responsibly, without fragmenting architecture, inflating costs, or increasing risk.

Adnan Masood, PhD, Chief AI Architect, UST.

Style

Post-info

AI has moved beyond pilots. Enterprises now face a very different management challenge: not whether to adopt AI, but how to govern dozens of copilots, models, and agents without fragmenting architecture, duplicating spend or increasing risk. The answer, in most large organizations, is not a bigger data science team. It is a well-designed AI Center of Excellence.

An AI Center of Excellence is best understood as an operating function, not a department label. Its role is to align AI investments to business priorities, define the enterprise architecture and control framework for AI, enable delivery teams with reusable platforms and standards, and ensure that value, risk, and adoption are managed with the same rigor. In mature organizations, the CoE is the place where strategy, engineering, governance, and economics come together.

That distinction matters. A data science team can build models. A platform team can expose tools. A risk team can write policy. But none of those functions, by themselves, creates a repeatable enterprise system for selecting the right use cases, shipping them responsibly, instrumenting them, and improving them over time. The CoE exists to close that gap.

For most enterprises, the strongest default design is a federated, hub-and-spoke model. A lean central group owns common policy, reference architectures, evaluation and release criteria, approved model access, logging and traceability standards, guardrails, vendor governance, and cost transparency. Business units own the domain workflows, business change, adoption, and accountable outcomes. That balance preserves local speed while avoiding enterprise chaos.

What makes this especially urgent in 2026 is that enterprise AI is no longer confined to predictive models. Organizations are now deploying retrieval-augmented systems, copilots, multi-model workflows, and agents that can reason, invoke tools, and act across enterprise systems. That shift expands the CoE mandate from model development to full-stack enablement: LLMOps, evaluation, observability, security, human oversight, and AI FinOps.

Senior leadership should therefore treat the AI CoE as a business enabler with four clear responsibilities: centralize the disciplines that must be consistent, federate domain execution, govern according to risk, and measure value at the use-case level rather than the demo level. When those four disciplines are in place, AI moves from enthusiasm to operating leverage.

DIVIDER

Definition and boundary conditions

What the CoE is and is not

The most useful definition is also the most practical one: the AI CoE is the enterprise operating system for AI. It provides the mechanisms by which the organization prioritizes AI opportunities, governs them, equips teams to deliver them, and monitors them after release. That is why the CoE should be judged less by how many projects it directly builds and more by how effectively the enterprise can build, reuse, govern, and scale AI through it.

By that definition, the CoE is not synonymous with the data science function, the platform engineering team, or the responsible AI office. It coordinates all three, but it is reducible to none of them. Its real job is integration: one operating cadence, one vocabulary for risk and value, one set of release disciplines, and one set of enterprise pathways for models, tools, data, and monitoring.

Modern AI also changes the boundary conditions. Traditional ML programs concentrated on data pipelines, training, deployment, and drift. Contemporary AI programs must additionally manage prompts, retrieval quality, tool invocation, memory, session state, model routing, fallback logic, output filtering, and the growing attack surface around LLM systems. In other words, much of the risk now sits at the system's edges rather than solely within the model.

In regulated sectors, the CoE should not create a parallel governance universe. It should connect directly to the institution's existing control architecture: privacy, cybersecurity, procurement, legal review, enterprise architecture, and, where applicable, model risk management. The best CoEs are connective tissue. They do not compete with second-line functions; they operationalize them.

DIVIDER

Operating model and governance design

The wrong question is whether AI should be centralized or decentralized. The right question is what must be standardized centrally and what should remain close to the business. In practice, policy, platform primitives, evaluation standards, logging and traceability, guardrails, model access, and cost attribution should usually be centralized early. Workflow design, local process knowledge, domain-specific prompting, and business accountability should usually remain with the line of business.

Where the CoE sits on the organization chart matters less than whether it has clear decision rights and sustained executive sponsorship. In early-stage organizations, the function often sits under the CIO, CTO, or CDO. As AI becomes a cross-enterprise capability, the stronger pattern is a cross-functional steering model that includes business leadership, finance, risk, security, and legal. AI governance fails when it is treated as either a purely technical problem or a purely policy problem.

A mature CoE also tiers governance by use-case risk. Low-risk internal productivity tools deserve a faster path to pilot and release. Customer-facing systems, decision-support tools in regulated contexts, and agentic systems that can trigger actions deserve deeper review, formal evaluation, clear human oversight, and richer monitoring. The EU AI Act's risk-based logic and long-standing sector guidance, such as SR 11-7 in banking, both point to the same leadership principle: governance should be proportional to impact.

One of the most common mistakes is turning the CoE into an approval queue. That design collapses under its own success. The remedy is to move as much governance as possible into reusable architecture, automated controls, standardized templates, reference implementations, and self-service pathways. The CoE should make the right way the easy way. If every AI initiative requires a series of bespoke meetings, the enterprise will route around the center, and shadow AI will proliferate.

DIVIDER

Modern enablement stack: The CoE must institutionalize

From MLOps to LLMOps and agent operations

Despite the current excitement around foundation models, the old truths of enterprise AI remain intact. Data quality still determines downstream performance. Release management still matters. Monitoring still matters. Reproducibility still matters. That is why the CoE must institutionalize MLOps disciplines before it tries to industrialize every new GenAI pattern. A weak operational foundation will eventually surface as reliability issues, compliance gaps, or runaway rework.

What has changed is the scope of operations. LLMOps is not a fashionable relabeling of MLOps. It expands the operational surface to include prompt lifecycle management, retrieval behavior, context-window design, system instructions, safety filters, benchmark datasets, model routing, output review, and failure analysis across probabilistic workflows. Once enterprises introduce RAG, copilots, or autonomous agents, the CoE must govern the system as a whole, not just the model endpoint.

Evals now sit at the center of enterprise AI quality. Leadership teams should insist that AI products have defined evaluation regimes before they scale. That means measuring task success, hallucination tolerance, failure modes, refusal behavior, latency, safety, and cost, and re-running those measures when prompts, models, tools, or knowledge sources change. Evals are how enterprises convert AI from persuasive demoware into accountable systems.

The same is true for guardrails. Enterprises should think of guardrails as a layered control system spanning input filtering, retrieval controls, policy checks, tool permissions, output handling, and human intervention points. The OWASP Top 10 for LLM applications is useful here not because it is exhaustive, but because it reminds leaders that LLM risk is socio-technical. Prompt injection, insecure output handling, sensitive data disclosure, and excessive agency are design failures before they become security incidents.

Agentic AI requires a distinct level of governance maturity. The moment an AI system can call tools, modify records, trigger workflows, or orchestrate other systems, the conversation shifts from answer quality to operational authority. No CoE should bless an agentic workflow without clear permissions, auditable action logs, rollback paths, approval thresholds for high-impact actions, and a defined boundary between what the agent may recommend and what it may do.

Observability and incident readiness should therefore be treated as first-class enterprise capabilities. The CoE should standardize traces, logs, prompt and output sampling, model and tool telemetry, abuse detection, and escalation paths. AI systems fail probabilistically and sometimes silently. A print-ready policy document is not a control; a monitored and testable production system is.

Finally, the modern CoE must own enablement beyond technology. Successful AI adoption requires product management, service design, workforce training, and executive literacy. The gap between a technically sound model and a successfully adopted workflow is often larger than the gap between model A and model B. Organizations that underinvest in change management routinely misdiagnose adoption failures as model failures.

DIVIDER

Funding, FinOps, and commercial models

AI economics have changed the governance agenda. Traditional analytics platforms typically had relatively stable infrastructure economics. Generative AI does not. Costs fluctuate with token volumes, response lengths, context growth, model choice, tool-calling behavior, and multi-step agent chains. This is why AI FinOps should be part of the CoE mandate from day one, rather than added after the first surprise invoice.

The best commercial model for most enterprises is a hybrid one. Shared platform capabilities, baseline governance, approved model access, and core observability should be centrally funded. Business-unit use cases, workflow redesign, and local adoption should be funded by the sponsoring domain. That structure keeps the common foundation healthy while forcing each use case to prove real business value. Showback usually makes sense before hard chargeback because it builds accountability without driving premature workarounds.

Equally important, the CoE should measure AI economics at the use-case level, not just the vendor bill. Cost per token and cost per inference are useful operational signals, but they are not a strategy. The real economic question is the cost to produce a reliable business outcome. That is why the supposedly cheapest model is often not the most economical one. If a higher-quality model needs fewer retries, shorter prompts, less rework, or fewer human corrections, it may be the lower-cost operating choice.

Model strategy belongs here as well. The decision between managed proprietary models and self-hosted or open models should be based on workload sensitivity, control requirements, latency needs, portability, operational burden, and long-term economics. The CoE's role is not to pick one ideology. Its role is to establish criteria, reference patterns, and commercial guardrails so that teams do not make material architectural decisions ad hoc.

DIVIDER

Lessons from the field

The field evidence is consistent on one point: effective CoEs are not abstract committees. They are business mechanisms. Intel's internal AI group, which described itself as becoming an AI Center of Excellence, reported USD 1 billion in business value in 2020 and tied that value to repeatable practices, productionization, and enterprise alignment rather than isolated model experiments. That is the right benchmark: measurable operating impact.

Banco Ciudad offers a more contemporary example of GenAI. The bank created an AI CoE to centralize strategy, then used that structure to deploy Microsoft 365 Copilot, Copilot Studio, and Azure-based agents. Within six months, it had developed more than 10 AI agents and reported concrete productivity and call center savings. The lesson is not that every enterprise should copy its tool stack. The lesson is that agentic AI scales faster when leadership aligns roadmap, enablement, and governance under one accountable center.

OTP Bank provides a different but equally important lesson. Its AI CoE was positioned to fuel adoption across the enterprise, but the story emphasized a centralized data foundation, repeatable workflows, and governance fit for a regulated environment. In other words, the glamorous part of AI scale rested on a quieter truth: architecture and data discipline still do most of the heavy lifting.

The public sector offers a useful parallel. The U.S. General Services Administration's AI guidance does not reduce enterprise AI readiness to tooling alone. It frames organizing AI around mission effectiveness, practitioner effectiveness, workforce development, and responsible implementation. That is precisely how senior leadership should view a CoE. Whether or not the enterprise uses the label, the underlying function remains the same: provide the organization with a coherent way to invest, govern, and build.

Taken together, these examples suggest that the best CoEs do three things well. They reduce friction for responsible adoption. They concentrate on the disciplines that must be shared. And they push ownership of business outcomes back into the business. That is why the most dangerous CoE design is the ivory tower. The most effective design is the enabling center.

DIVIDER

Implementation playbook

In the first 30 days, leadership should resist the temptation to launch a grand platform program. The immediate objective is clarity. Define the charter, designate the executive sponsor, inventory current AI tools and experiments, classify use cases by risk, and create one approved pathway for model access, logging, and evaluation. Early wins come from governance and focus, not from trying to solve the entire stack at once.

Over the next 60 to 90 days, the CoE should stand up a small set of reusable patterns that solve recurring needs. In most enterprises that means a controlled retrieval-augmented assistant, a document intelligence workflow, and a tightly governed agent use case with human checkpoints. At the same time, the CoE should publish an approved model list, initial prompt and agent review criteria, evaluation datasets, cost telemetry, and a basic service catalog for teams that want to build.

Within the first year, the goal is to form a federation. Business units should have named AI leads or champions. Procurement and vendor reviews should align with the CoE's standards. Showback should be in place. The organization should have common traceability, evaluation, and guardrail patterns. Training should extend beyond builders to managers and executives as well. And the CoE should be reporting on value realization, not just activity levels.

Just as important is knowing what not to centralize. The CoE should not own every use case backlog, every prompt, or every process redesign decision. Those belong with the teams accountable for outcomes. Centralizing the wrong work is how CoEs become slow, distrusted, and eventually bypassed. The center should have a common discipline. The business should own business change.

DIVIDER

The leadership imperative

Every enterprise eventually decides whether its AI program will be intentional or incidental. Without a Center of Excellence, AI capability tends to fragment across tools, teams, and vendors. Governance becomes reactive. Costs become opaque. Architecture drifts. And the organization confuses activity with progress. A well-designed AI CoE is how leadership prevents that outcome.

The point of the CoE is not control for its own sake. It is to create a durable enterprise capability: one that allows innovation to happen faster because the boundaries, pathways, economics, and responsibilities are clear. In that sense, the AI CoE is not a brake on AI ambition. It is the structure that makes sustained AI ambition possible.

The enterprises that win with AI will not be the ones that simply deploy the most models. They will be the ones who build the most coherent operating system around them.

Discover how UST helps organizations design, operationalize, and scale AI Centers of Excellence—covering governance, platforms, FinOps, observability, and adoption.

Explore UST’s data & analytics and AI capabilities >