As enterprises roll out AI agents into production, they’re facing a pressing reliability issue. Initial excitement over large language models (LLMs) is giving way to the realization that LLM performance alone isn’t enough for successful deployment. Companies are now reengineering their AI systems to prioritize robustness, recovery, and cost management. This shift signifies a critical moment for AI in enterprise settings, demanding a rethink of system architectures to ensure long-term viability.
### Agentic AI has supercharged familiar engineering problems
AI agents, particularly those relying on LLMs, introduce complexities that amplify existing engineering challenges. These systems often involve complex, long-running processes that span multiple services, APIs, and tools. A single AI workflow might entail calling several models, managing state over extended periods, and interacting with various applications. These complexities often become apparent only after deployment, as noted by Preeti Somal, Senior VP of Engineering at Temporal Technologies.
Somal highlights that many organizations rushed to deploy AI without addressing critical underlying infrastructural needs. “These patterns aren’t necessarily new,” she said. “AI just supercharges them.” The scramble to integrate AI has left companies grappling with workflows that crash and burn, necessitating expensive and time-consuming restarts. This scenario mirrors the early days of cloud adoption, where enterprises moved workloads without modernizing architectures, leading to inflated costs and limited value realization.
### Why long-running agents force a new architecture
The need for a robust architectural foundation becomes apparent as enterprises increasingly deploy AI agents for long-running workflows. These workflows can span hours or days, interacting with multiple systems and tools. The prolonged duration introduces reliability challenges, particularly concerning state maintenance and recovery after failures.
Somal stresses the importance of distinguishing between ‘state’ – the workflow’s current position and actions taken – and ‘memory’ or ‘context,’ which refers to the information carried forward across tasks. This distinction is crucial as enterprises transition from simple chatbots to complex, long-running processes. For instance, in the healthcare sector, companies like Abridge are using AI to process physician visits across multiple stages, necessitating precise state management to ensure seamless execution.
### Real implications for founders, engineers, and the industry
For engineers and founders, the current landscape underscores the necessity of building robust, reliable AI systems from the ground up. The focus is shifting from rapid deployment to creating architectures that can withstand failures and manage costs effectively. Engineers need to rethink their approach to AI system design, focusing on workflow orchestration, observability, and governance.
For the AI industry at large, this transition marks a maturation phase where the focus is on sustainable, reliable AI applications rather than quick wins. Investors and VCs should be mindful of these dynamics, recognizing that the companies poised for long-term success will be those that prioritize architectural soundness and reliability.
### What happens next
As enterprises continue to confront the reliability challenges of AI agents, the focus will likely shift toward developing tools and frameworks that facilitate durable execution and effective state management. This evolution presents opportunities for startups and established companies alike to innovate in the realm of AI infrastructure. For founders and engineers, the message is clear: the future belongs to those who build AI systems that not only function but thrive in real-world, production environments. Embrace the challenge of creating durable, reliable AI solutions, or risk being left behind as the industry moves forward.
