Building Agentic Workflows with LangChain

Over the past two years at Intuit, I've been working on agentic AI systems that go far beyond simple prompt-in, answer-out patterns. These workflows chain multiple LLM calls together, invoke external tools, and make decisions autonomously — all while serving millions of users on TurboTax and QuickBooks.

What Makes a Workflow "Agentic"?

Traditional LLM integrations follow a request-response pattern: the user asks a question, the model answers. Agentic workflows are different. The LLM acts as a reasoning engine that decides what to do next, calls tools, evaluates results, and iterates until the task is complete.

At Intuit, a typical agentic workflow might look like this:

User uploads a W-2 form
Agent invokes an OCR tool to extract fields
Agent validates extracted data against IRS schemas
Agent identifies discrepancies and asks clarifying questions
Agent populates the correct tax form fields

Each step involves an LLM call that decides the next action. The agent maintains state, handles errors, and can backtrack when needed.

LangChain as the Orchestration Layer

We chose LangChain (and later LangGraph) as our orchestration framework for several reasons:

Tool abstraction — Clean interfaces for wrapping internal APIs, databases, and microservices as callable tools
Memory management — Built-in support for conversation history and working memory across multi-turn interactions
Composability — Chains and graphs let us build complex flows from simple, testable components
Observability — LangSmith integration gave us full trace visibility in production

Architecture: The Agent Graph

Our production agents use a state graph pattern (via LangGraph) rather than simple sequential chains. Each node in the graph represents a capability — data extraction, validation, calculation, user interaction — and edges represent conditional transitions.

from langgraph.graph import StateGraph, END

graph = StateGraph(AgentState)
graph.add_node("extract", extract_document_data)
graph.add_node("validate", validate_against_schema)
graph.add_node("clarify", ask_user_clarification)
graph.add_node("populate", populate_form_fields)

graph.add_edge("extract", "validate")
graph.add_conditional_edges("validate", route_on_confidence)
graph.add_edge("clarify", "validate")
graph.add_edge("populate", END)

The route_on_confidence function checks the validation score. If confidence is below our threshold, it routes to the clarification node; otherwise, it proceeds to population.

Reliability at Scale

Running agentic workflows for millions of users taught us hard lessons about reliability:

Token budgets — Every agent call has a maximum token budget. We implemented circuit breakers that terminate runaway chains before they exhaust resources.
Deterministic fallbacks — When the agent can't resolve a task confidently, we fall back to rule-based systems rather than guessing. Users trust accuracy over speed.
Structured outputs — We enforce JSON schemas on every LLM response using Pydantic models. This catches hallucinated fields before they propagate downstream.
Async execution — Long-running agent tasks (like multi-document analysis) run asynchronously with progress callbacks, preventing request timeouts.

Monitoring and Evaluation

We built a custom evaluation pipeline that runs nightly against a golden dataset of known-good outcomes. Key metrics include:

Task completion rate — Percentage of workflows that reach the final state without human intervention
Tool call accuracy — Whether the agent selected the right tool for each step
Latency P95 — End-to-end time for the most complex workflows
Cost per task — Total token spend per completed workflow

Key Takeaways

Building agentic systems is fundamentally different from building traditional software. The non-determinism means you need robust guardrails, extensive testing, and graceful degradation paths. But when it works, the results are remarkable — our agent-powered features reduced manual data entry time by 60% while improving accuracy.

If you're starting with agentic workflows, my advice: begin with a narrow, well-defined task. Get reliability right before expanding scope. And invest heavily in observability from day one — you can't debug what you can't see.

Building Agentic Workflows with LangChain

What Makes a Workflow "Agentic"?

LangChain as the Orchestration Layer

Architecture: The Agent Graph

Reliability at Scale

Monitoring and Evaluation

Key Takeaways

Need AI agents for your business?

Get AI & engineering insights

Keep Reading

From Council to Production: Shipping 38 AI Projects in 60 Days

Bilingual Vertical SaaS for Mexico: What I Learned Building 8 Products in 30 Days