Business

Rethinking Enterprise AI: A Conversation with Siva Hemanth Kolla on GenAI, Hybrid Architectures, and the Future of Automation

Siva Hemanth Kolla is a Generative AI researcher and enterprise technology expert focused on building practical AI systems for modern businesses. His work brings together automation, AI, and large-scale enterprise platforms. Over the years, he has worked with companies like Verizon, Optum, Citi, and Staples, helping teams improve workflows, service management, and operational efficiency. He has moved from traditional ServiceNow development into advanced AI systems. His work today includes Large Language Models, Small Language Models, Retrieval-Augmented Generation, and agentic AI systems. He focuses on building AI tools that can work securely inside real business environments. His projects are designed for industries where privacy, compliance, and reliability matter every day.

Siva has also written six research papers and earned six patents in AI and intelligent automation. His ideas around hybrid AI architectures and governed AI systems are gaining attention in enterprise technology spaces. In this interview, Siva shares his thoughts on enterprise AI, automation, and the future of intelligent systems. He also discusses the growing role of responsible AI and why businesses need practical solutions instead of hype-driven innovation.

1. Siva, thank you for taking the time to speak with us today. Your journey from a ServiceNow specialist to a Generative AI researcher and enterprise architect is fascinating. What were some of the early turning points that led you to move from traditional automation into GenAI and agentic systems?

Siva Hemanth Kolla: My journey began with rule-based automation and knowledge-centered workflows primarily within the ServiceNow ecosystem. Early on, I realized that while traditional automation was powerful for structured, predictable tasks, it hit a ceiling the moment business problems became contextual, dynamic, or unstructured.

The first real turning point was recognizing how much human effort was still going into tasks that looked automated on the surface, things like triaging IT incidents, interpreting knowledge articles, or resolving employee queries. The rules could handle the “known,” but the “unknown” always escalated back to humans.

That gap pushed me toward deep learning–driven workflow orchestration and context-aware virtual assistants. When Large Language Models started demonstrating genuine language understanding and reasoning, I saw an opportunity not just to automate tasks but to build systems that could think through problems, understand intent, retrieve relevant knowledge, and take action autonomously.

The move into agentic systems came naturally from there. Once you have an AI that can reason, the next question becomes: can it act? That’s what drew me to multi-agent architecture systems, where specialized agents collaborate to plan, retrieve, and execute across complex enterprise workflows. It wasn’t a trend I followed; it was a logical evolution of the problems I was already trying to solve.

2. Your recent work reflects a shift from a monolithic LLM dependency to hybrid architectures that combine Small Language Models and large foundation models. In terms of performance, cost, and control, what differences did you observe? Furthermore, what made this approach feel like the right direction rather than just another trend?

Siva Hemanth Kolla: When I was working with monolithic LLM-only architectures, the limitations became apparent fairly quickly in enterprise settings. Large foundation models are incredibly capable, but routing every single query, whether it’s a complex reasoning task or a simple intent classification, through a heavyweight LLM is both cost-prohibitive and operationally inefficient at scale.

The shift to hybrid architectures, where Small Language Models and large foundation models work in orchestration, changed the equation significantly across three dimensions:

Performance: SLMs, when fine-tuned on domain-specific enterprise data, can handle focused tasks such as intent detection, entity extraction, and routing decisions with speed and accuracy that rival those of much larger models. This keeps latency low for high-frequency, repetitive tasks while reserving the large foundation model for complex reasoning and generation.

Cost: In production enterprise environments, inference costs accumulate rapidly. By intelligently routing tasks to the right-sized model, we saw meaningful reductions in compute costs without sacrificing output quality where it mattered most.

Control & Governance: This was perhaps the most underappreciated benefit. SLMs are easier to audit, fine-tune, and constrain within enterprise governance boundaries. In compliance-heavy domains, having a smaller, interpretable model handling sensitive routing decisions gave us far greater control over data privacy and regulatory alignment.

What made this feel like the right direction rather than a trend was that it solved real enterprise pain points, not theoretical ones. The hybrid approach lets us embed trust, privacy, and compliance directly into the AI pipeline, which is something a one-size-fits-all LLM dependency simply cannot offer cleanly.

3. When you build RAG-based systems that pull from enterprise knowledge, there’s always a question of “what should the AI know” versus “what should it look up.” How do you personally draw that line when working on real projects?

Siva Hemanth Kolla: This is one of the most nuanced design decisions in building enterprise RAG systems, and honestly, it’s something I think about deeply on every project. The line between what the AI should know and what it should look up is not just a technical boundary; it’s an architectural philosophy.

My general principle is this: anything that changes, expires, or varies by context should be retrieved, not baked in.

Here’s how I think about it in practice across three layers:

What the model should inherently know: Foundational reasoning capabilities, domain language understanding, general workflow logic, and how to interpret enterprise terminology. These are things that make the model useful as a base; they don’t need to live in a retrieval index because they’re stable and generalized.

What must always be retrieved: Anything tied to live enterprise knowledge, current policies, configuration item states, incident records, HR procedures, compliance guidelines, and customer data. In environments like ITSM or HRSD, this information is constantly evolving. If the model “knows” it statically, it becomes a liability the moment that knowledge goes stale. At Optum, for instance, working with over 15,000 configuration items meant that retrieval accuracy wasn’t just a performance metric; it was a trust and compliance requirement.

The grey zone and how I handle it: There’s always a middle layer of semi-stable knowledge, things like process flows or escalation paths that don’t change daily but do change periodically. For these, I prefer retrieval with versioning and caching strategies over embedding them in model weights. This keeps the system auditable and adaptable without sacrificing response speed.

Ultimately, the question I always ask is: “If this information changes tomorrow, will the AI give a wrong or harmful answer?” If the answer is yes, it belongs in the retrieval layer always.

4. At Verizon, you worked on integrating ServiceNow with external systems like Netcool, Remedy, and Genesys, enabling real-time, bi-directional data exchange. How has that experience influenced the way you now think about building AI systems that must operate across multiple enterprise platforms without breaking data consistency?

Siva Hemanth Kolla: My time at Verizon was genuinely formative in shaping how I think about enterprise-scale AI architecture today. Integrating ServiceNow with systems like Netcool, Remedy, and Genesys wasn’t just a technical exercise; it was a masterclass in what happens when data flows across systems that were never designed to talk to each other.

The core challenge with bi-directional, real-time data exchange is that every system has its own truth. Netcool sees the network. Remedy owns the ticket. Genesys owns the customer interaction. ServiceNow tries to be the orchestration layer. The moment data moves between these systems, you’re managing competing versions of reality, and consistency becomes your most fragile asset.

That experience directly shapes how I approach AI systems operating across multiple enterprise platforms today, in three specific ways:

Event-driven thinking over polling: At Verizon, we learned quickly that polling-based integrations create lag and data drift. I carry that lesson into AI architectures, agentic systems should be triggered by state changes, not scheduled checks. This keeps the AI’s understanding of the enterprise state as current as possible.

Canonical data contracts: Working across Netcool, Remedy, and Genesys taught me to establish a shared data language early. In AI systems today, I apply the same principle before any agent acts on data from multiple platforms; there must be a clearly defined canonical schema that normalizes inputs and prevents conflicting interpretations from propagating through the pipeline.

Failure as a first-class concern: Bi-directional integrations fail in ways that are silent and dangerous, such as partial updates, out-of-order events, and duplicate triggers. I design AI systems with the same paranoia. Every agentic action that touches enterprise data must be idempotent, traceable, and recoverable. In regulated environments, especially, an AI that acts on stale or inconsistent data isn’t just inefficient, it’s a compliance risk.

The Verizon experience essentially taught me that data consistency is not an infrastructure problem; it’s an architectural commitment. That philosophy now sits at the foundation of every multi-platform AI system I build.

5. You’ve written about and implemented multi-agent AI systems that can plan, reason, and execute tasks collaboratively. From what you’ve seen so far, where do these systems genuinely outperform the kind of automation enterprises have been relying on for years?

Siva Hemanth Kolla: This is a question I find genuinely exciting to answer, because the difference between multi-agent AI systems and traditional enterprise automation isn’t just incremental, it’s architectural. And having worked on both sides of that line, I can speak to where the gap is real versus where it’s overhyped.

Traditional automation, whether it’s RPA, rule-based workflows, or even early virtual assistants, excels at known, structured, repetitive tasks. If the process is well-defined, the inputs are predictable, and the outcomes are deterministic, classical automation is fast, reliable, and cost-effective. I don’t dismiss that value.

But enterprises don’t just run on structured processes. They run on exceptions, ambiguity, and cross-functional complexity. That’s precisely where multi-agent systems begin to outperform:

Handling unstructured and dynamic inputs: Traditional automation breaks the moment an input deviates from its expected format. A multi-agent system, by contrast, can interpret intent, ask clarifying questions, re-route itself, and adapt in real time. In ITSM environments, this means handling incident descriptions written in natural language with inconsistent terminology, missing fields, and varying urgency signals without requiring rigid templating.

Cross-domain reasoning and orchestration: Classical automation operates within silos. A multi-agent architecture allows specialized agents, one for knowledge retrieval, one for decision reasoning, and one for action execution, to collaborate across domains in a single workflow. This mirrors how human teams actually solve complex problems, and it enables end-to-end automation of processes that previously required multiple handoffs between systems and people.

Adaptive planning under uncertainty: One of the most powerful capabilities I’ve observed in production multi-agent systems is dynamic re-planning. When an action fails or new information surfaces mid-execution, the system can reason about the new state and adjust its plan, something rule-based automation fundamentally cannot do without human intervention.

Audit-ready explainability: Interestingly, well-designed multi-agent systems can be more explainable than complex rule engines that have accumulated years of conditional logic. Each agent’s reasoning step can be logged, traced, and reviewed, which in compliance-heavy domains like healthcare or telecom is not just useful, it’s essential.

Where I urge caution, however, is in assuming multi-agent systems are universally superior. They introduce coordination overhead, require robust governance frameworks, and can fail in non-obvious ways if not carefully designed. The enterprises that get the most value are those that deploy them thoughtfully, augmenting human judgment rather than replacing it blindly.

The bottom line is this: traditional automation answers the question “Can this be done automatically?” Multi-agent AI answers the question “Can this be done intelligently?” For enterprises operating in complex, regulated, and fast-changing environments, that distinction matters enormously.

6. Across your work, from improving CMDB visibility for over 15,000 configuration items at Optum to enabling audit-ready workflows in highly regulated environments, you’ve consistently operated in compliance-heavy domains. To end this interview, can you share how regulatory requirements shape the way you design and deploy GenAI systems today? To what extent does pushing capability weigh on your decisions?

Siva Hemanth Kolla: This is perhaps the question closest to my core design philosophy, and I’m glad it’s the one we end on because compliance isn’t something I treat as a constraint layered on top of AI systems. It’s something I architect into them from the very first design decision.

Working at Optum with over 15,000 configuration items and operating across regulated domains in healthcare, telecom, and enterprise IT, taught me one fundamental truth: in high-stakes environments, an AI system that cannot explain itself is an AI system that cannot be trusted, and an AI system that cannot be trusted will never scale.

Here’s how regulatory requirements concretely shape my design and deployment decisions today:

Compliance as an architectural primitive: From 2023 onward, I shifted toward governance-aligned RAG architectures and cloud-native GenAI deployment models that embed trust, privacy, and compliance directly into the AI pipeline, not as an afterthought, but as a foundational layer. This means data access controls, role-based retrieval boundaries, and audit logging are designed in at the start, not bolted on at the end.

Auditability over black-box performance: In regulated environments, a model that performs slightly better but cannot produce an audit trail is less valuable than one that is fully traceable. I consistently make design choices that favor explainability and traceability, logging every agent action, every retrieval decision, every output generation step, so that when a compliance review happens, the system can answer for itself.

Data residency and privacy by design: Especially in healthcare and financial domains, where data sovereignty and privacy regulations are non-negotiable, I design AI systems with strict data residency controls and privacy-preserving retrieval mechanisms. This is also one of the reasons hybrid architectures with SLMs are so valuable in these contexts, keeping sensitive processing local and under governance rather than routing everything through external foundation-model APIs.

On the tension between capability and caution: This is where I’ll be honest, there is real tension. The most capable AI configurations are often the ones that carry the most regulatory risk. Pushing the boundary of what GenAI can do in a regulated environment requires me to constantly ask: “Can I defend this decision in front of a regulator? Can I explain what the system did and why?” If the answer is no, I pull back not because I’m risk-averse, but because sustainable AI adoption in enterprises depends on building trust incrementally.

The organizations that will win with GenAI in regulated industries are not those that deploy the most powerful models;  they are those that deploy the most responsible ones. Capability without governance is a liability. But governance without capability is stagnation. My work has always lived in that space between the two, pushing what’s possible while never losing sight of what’s accountable.

That balance, I believe, is what defines the future of responsible enterprise GenAI.

Conclusion

Siva Hemanth Kolla’s insights reveal how modern automation is moving beyond simple workflows into systems that can reason, retrieve information, and support business decisions in smarter ways. At the same time, he continues to stress the importance of governance, compliance, and long-term reliability in AI development. Throughout this conversation, Siva offers a grounded view of where enterprise AI is heading. He speaks openly about the shift toward hybrid AI systems, the growing role of agentic architectures, and the need for businesses to build AI environments that people can actually depend on. His perspective is helping organizations think more carefully about how AI should be designed, deployed, and managed inside critical systems.

Click to comment

You May Also Like

News

Today we’d like to introduce you to Simone Ganesh-Goode. It’s an honor to speak with you today. Why don’t you give us some details...

Business

Today we’d like to introduce you to Ramdas Yawson. It’s an honor to speak with you today. Why don’t you give us some details...

News

Today we’d like to introduce you to Dessy Handsum. It’s an honor to speak with you today. Why don’t you give us some details...

News

Today we’d like to introduce you to Chauntae Hammonds. It’s an honor to speak with you today. Why don’t you give us some details...

© 2023 New York Business Now - All Rights Reserved.

Exit mobile version