Enterprise AI is moving beyond assistive use cases, with agents increasingly expected to plan, coordinate, and carry out complex workflows across enterprise systems. Simply put: If the first wave of enterprise AI was defined by chatbots answering questions, the second wave, the one unfolding right now, is about getting things done.
This shift is underway across boardrooms and engineering or development teams. AI systems are no longer just drafting emails or summarising meetings. They are being asked to plan, orchestrate, and execute multi-step workflows across entire enterprise tool stacks.
For instance, Microsoft’s Copilot uses real-time signals from Outlook, Teams and Excel to ensure context-aware actions. It also leverages real-world data (like an unread email or a mention in a chat) to remind a user of a pending action and then builds tailored outputs.
Similarly, Google is enabling AI agents to interact with its productivity suite through programmable interfaces. A programmed AI agent can now enter a Google Drive, open a document and edit it — just like a human, but much faster.
Beyond big techs, a swarm of startups is creating AI conduits that no longer just offer assistance but can execute multi-step workflows.
“The workplace itself is becoming the operating environment where agents plan and execute work… What the industry is building is not simply smarter assistants. It is a new execution fabric that sits between human intent and enterprise systems,” said Greyhound Research CEO Sanchit Vir Gogia.
Taken together, a new layer of enterprise AI architecture is forming, one that could prove more consequential than the historic transition to the cloud. But how is this new enterprise AI stack taking shape?
From Assistance To Execution
The paradigm is moving from AI that helps you work to AI that works for you. The engineering implications of this shift are massive, requiring a multi-layered approach to handle the complexity of the modern office.
Gogia breaks this down into four layers:
- Intent interface layer: Where users describe outcomes
- Planning layer: Where agents decompose goals into action sequences
- Execution runtime layer: Invokes APIs and calling tools, and interacts with enterprise applications
- Control layer: This is responsible for identity, governance, and auditability
Google Workspace CLI serves as a prime example of this infrastructure in motion. It dynamically generates commands from Google’s API discovery services, allowing agents to interact with Gmail, Drive, and Sheets on a user’s behalf without custom integrations.
“According to Google Cloud, the vision extends well beyond developer tooling. With Workspace Flows, enterprises can automate work across apps autonomously using plain language.
Through Workspace Studio, any employee can build agents using natural language that orchestrate work across Gmail, drive, and chat. These agents can understand the full context of work across the organisation, matching company policies and generating content in the user’s own tone and style.
The ambition is to give agents a unified view of ‘enterprise truth’ — understanding not just one document, but the relationships between emails, chats, CRM data, and project trackers across the entire organisation.
The whole concept of Cowork [Claude’s agentic AI assistant] is that just like humans as their coworkers, you give them a very high-level guidance and then based on their skills and profiles, they deliver.
As an example for his company, an integration with Claude’s Copilot Cowork illustrates how the broader ecosystem is taking shape — agents connecting to specialised tools through plugin architectures, each component contributing a specific capability to a workflow that no single system could execute alone.
“Ankush Sabharwal, founder and CEO of CoRover and BharatGPT, said, “The biggest challenge is moving from AI that suggests actions to AI that reliably executes them. Enterprise workflows require deterministic outcomes, while large language models are inherently probabilistic.”
He argues that agents need secure API orchestration, contextual understanding of enterprise data, strong identity and access controls, and workflow reliability layers to truly operate across enterprise software stacks.
He argues that agents need secure API orchestration, contextual understanding of enterprise data, strong identity and access controls, and workflow reliability layers to truly operate across enterprise software stacks.
The numbers also reflect how early the market still is. Industry studies cited by Sabharwal show that more than 60% of enterprises are exploring AI agents, but less than a quarter have scaled them into production workflows.
Tasks that once required five business days, like pulling data, building dashboards, and iterating on analysis, can now be done in hours. “The time to analyse and the time to reach a conclusion is way shorter,” Jain said.
The numbers also reflect how early the market still is. Industry studies cited by Sabharwal show that more than 60% of enterprises are exploring AI agents, but less than a quarter have scaled them into production workflows.
Tasks that once required five business days, like pulling data, building dashboards, and iterating on analysis, can now be done in hours. “The time to analyse and the time to reach a conclusion is way shorter,” Jain said.
Reliability, Guardrails, And the Trust Gap
While the demos of these execution-led AI agents look compelling, they stumble in production. Greyhound Research’s analysis identifies a persistent gap between what agent systems promise and what they deliver in live enterprise environments. Agents often fail when interacting with dynamic systems where interfaces shift, data schemas change, or permissions vary. When agents operate through iterative reasoning loops, small errors compound quickly.
“The root problem is not model capability, but that enterprise systems themselves are complex and unpredictable,” Gogia noted.
To counter this, a new solution is emerging in the form of “hybrid orchestration” – language models that plan and coordinate, while sitting atop the deterministic automation infrastructure that handles the actual execution.
Jain noted that human intervention remains the primary safeguard. Therefore, Kyvos restricts delete commands and tests agentic workflows within sandboxed environments to maintain control.
Here, Sabharwal of CoRover identifies three non-negotiables:
- Role-based permissions so agents operate within defined policies.
- Full auditability and traceability of every action.
- Human-in-the-loop oversight, especially for high-impact decisions.
“The future enterprise AI stack is not just required to be intelligent, but also transparent, controllable, and compliant by design,” he added
Sonica Aron, founder and managing partner of Marching Sheep, added a workforce dimension to the debate. “While the efficiency and productivity seem extremely attractive, organisations should not adopt them at scale without strong guardrails.”
She argues that AI agents will likely shift employees from execution to supervision roles.
Gogia flags a subtler but growing security risk. He said prompt injection attacks, where malicious instructions embedded in emails, documents, or tickets are interpreted as legitimate agent commands.
Agents Are The New Enterprise OS
Whoever owns the agentic workspace layer may effectively own the operating system for knowledge work and, with it, a new form of enterprise lock-in more complex than anything the SaaS era produced.
Gogia argues that lock-in risks stems from operational ecosystems — libraries, governance, and connectors — rather than the models themselves. As these layers integrate, switching costs become compounding.
Sabharwal envisions a future of interoperable, multi-model AI ecosystems over closed platforms. To ensure enterprise flexibility, CoRover focuses on open infrastructure using specialised small language models and a conversational interface for building and deploying agents across diverse stacks.
Aron echoes the governance argument from the enterprise adoption side: “The companies that implement these governance structures early will be able to capture productivity gains without exposing themselves to operational or reputational risk.”
On whether to automate everything, Jain said, “It’s always where you’re investing more human resources where you can save their cost — those become the primary point of implementing AI. Not every workflow needs an agent. If things are working, don’t break it.”
The productivity war for the agentic workspace layer has begun. While tech giants and startups are building the infrastructure to enable this shift — from unified data layers to agent orchestration frameworks — reliability gaps, security risks, and governance challenges continue to slow adoption, even as the promise of significant productivity gains drives experimentation.
Authored by Pratik Jain, Kyvos, and originally published on Inc42.