LangGraph, Deep Agents, and Budget

A Memosa deal analysis is a single LangGraph workflow. The orchestrator graph fans out to domain subgraphs (financial, risk, market, property, comparables, exit strategy), each of which drives a deep agent — an LLM agent built on the deepagents SDK and wrapped in a fixed stack of project middleware. Above the agents sits a phase-envelope budget system: each pipeline phase gets a guaranteed time envelope, surplus reallocates to later phases, and the whole workflow degrades through named modes rather than hanging when wall-clock runs short.

This page documents the four layers an engineer needs to reason about: the LangGraph node/runtime contract, the decomposed DealAnalysisState, the non-negotiable middleware stack, and BudgetAuthority. Every constant cited here is anchored to the file it lives in.

LangGraph v1.0.9 node and runtime contract

Every node in the workflow is an async def that takes the state and a typed runtime, and returns a partial-state dict. The runtime — not a RunnableConfig — is how a node reaches injected services and the thread identity.

from langgraph.runtime import Runtime
from src.langchain.workflows.state.workflow_context import (
    WorkflowContext, get_thread_id,
)

async def my_node(state: DealAnalysisState, runtime: Runtime[WorkflowContext]) -> dict:
    thread_id = get_thread_id(runtime)
    deal_store = runtime.context.deal_store
    budget_authority = runtime.context.budget_authority
    return {"some_state_field": ...}

WorkflowContext is a frozen dataclass (src/langchain/workflows/state/workflow_context.py). It carries the required thread_id, plus optional deal_id, namespace, deal_store, budget_authority, progress callbacks, and the cross-subgraph source_id_allocator that keeps [SRC:n] markers globally unique. Live service objects (deal_store, budget_authority, callbacks) are stripped from __getstate__ so they never break a LangSmith trace pickle.

The graph itself is built with both schemas declared:

workflow = StateGraph(
    state_schema=DealAnalysisState,
    context_schema=WorkflowContext,
    version="v2",
)

Streaming uses stream_mode="updates" at the workflow boundary (never astream_values() / .astream_values(). Inside a deep agent, invocation streams stream_mode="values" so the orchestrator gets a full-state chunk after each LangGraph superstep — this is what powers the per-step heartbeat and hang detector described under Budget enforcement on the live path.

State: one flat TypedDict, four focused parts

DealAnalysisState looks flat to LangGraph but is composed from four sub-TypedDicts via multiple inheritance (src/langchain/workflows/state/deal_analysis_state.py):

class DealAnalysisState(
    CoordinationState,   # routing, stage tracking, execution counters/flags
    AgentResultState,    # per-subgraph outputs (risks, comparables, synthesis inputs)
    TelemetryState,      # metrics, diagnostics, quality scores, debug fields
    OutputState,         # final deliverable artifacts (memo, markdown, footnotes)
):
    ...

Sub-schema	Concern	Fields
`CoordinationState`	Routing, stages, execution counters/flags	44
`AgentResultState`	Per-subgraph outputs from domain agents	60
`TelemetryState`	Metrics, diagnostics, quality scores, debug	30
`OutputState`	Final deliverable artifacts	12

LangGraph sees the merged flat TypedDict, so there is zero runtime cost to the decomposition — it is purely an organizational and safety boundary.

Two state disciplines are load-bearing and enforced project-wide:

Accumulating lists use a bounded reducer. Any Annotated[List[...], ...] field that grows over the run uses ring_buffer_append(N) (e.g. agent_discourse caps at 50), never a lambda x, y: x + y. The ring buffer treats None as a no-op and [] as an explicit clear, and prevents unbounded checkpoint growth.
Large artifacts live in the Store, not state. Memo section content lives in PGStore (deal_store.set_memo_section); state carries only the metadata-only memo_sections_manifest. State is coordination; the Store is the system of record for deliverable content.

Deep agents: one base class, no duplication

Nine deep agents drive the analysis — Financial, Risk, Market, Property, Exit Strategy, Comparables, Synthesis, Coherence, and Critique. Every one of them is constructed through create_deep_agent() from the SDK, and every one inherits BaseDeepAgent (src/langchain/research/base_deep_agent.py).

BaseDeepAgent owns the shared runtime concerns so no subgraph re-implements them:

LLM client lazy creation via the injected ProductionLLMFactory (Anthropic); no direct ChatOpenAI/ChatAnthropic instantiation anywhere in agent code.
Circuit-breaker gate — before invoking, it checks the global Pinecone circuit breaker and either waits for reset (if there is budget to wait) or fast-fails.
ToolExecutionContext lifecycle — _setup_tool_context() / _teardown_tool_context() bracket execution, wiring the namespace, the CrossSectionChunkRegistry (which deduplicates identical Pinecone queries across parallel section agents), and the deal-scoped backend.
Memory, findings, and playbook tool wiring — read_evicted_file, record_finding / query_agent_findings, record_strategy / recall_strategy, retrieve_playbook (team-scoped, via the injected PatternsService), and read_section (read a completed sibling section instead of re-deriving it).
Budget-aware invocation — resets middleware budget to actual remaining time, then streams the agent with the phase-aware poll.

Subclasses implement only the domain-specific hooks: _build_tools(), _build_subagents(), _build_system_prompt(), _populate_skills(), and _parse_agent_result().

All dependencies are constructor-injected (retrieval_engine, timeout_budget_manager, llm_factory, deal_store, patterns_service, cross_section_cache, and the create_deep_agent_fn itself). Nothing is instantiated internally — see Dependency Injection.

The middleware stack (order is non-negotiable)

When BaseDeepAgent._create_agent() builds an agent, it composes the middleware list via build_research_middleware() (src/langchain/research/middleware/__init__.py). The result is 16 always-on project middlewares plus up to 6 conditional layers (four SDK + two project), assembled in a single fixed order.

-1.  AgentTelemetryMiddleware            # wall-time + counter aggregation
-0.5 BudgetSurplusReleaseMiddleware      # return unspent budget to surplus pool
 0.  DealContextInjectionMiddleware      # namespace validation + preamble injection
 1.  BudgetEnforcementMiddleware         # hard budget gate (soft + absolute limits)
 1b. DynamicToolSuppressionMiddleware    # proactive tool filtering on low budget
 1c. LLMToolSelectorMiddleware (SDK)     # relevance-based tool prefilter  [conditional]
 2.  ToolCallLimitMiddleware             # per-tool call count limits
 2a. ToolInvocationTimeoutMiddleware     # per-tool wall-time cap + elapsed telemetry
 2a-bis. WriteTodosInputNormalizerMiddleware  # coerce JSON-string `todos` → list
 2b-1. PIIMiddleware × N (SDK)           # built-in PII detectors  [conditional]
 2b. ComplianceFilterMiddleware          # credential-leak heuristics
 2c. QueryDiversityEnforcementMiddleware # active diversity steering
 2d. AnthropicPromptCachingMiddleware    # appended by the SDK tail, NOT here
 3.  ModelRetryMiddleware (SDK)          # retry transient LLM errors  [conditional]
 4.  ModelFallbackMiddleware (SDK)       # Sonnet → Haiku cascade  [conditional]
 5.  ToolRetryMiddleware                 # exponential backoff for vector retrieval
 5b. ToolResultEnrichmentMiddleware      # metadata footer on search results
 6.  ContextCompressionMiddleware        # proactive compression at 60% capacity
 6b. ParseQualityGateMiddleware          # stuck-agent detection
 6d. IntermediateProgressMiddleware      # cycle events for long runs  [conditional]
 6c. CitationIntegrityMiddleware         # strip hallucinated [SRC:n]
 7.  ContextEditingMiddleware (SDK)      # clear tool INPUTS at 50%  [conditional]
 8.  SummarizationMiddleware (SDK)       # full history archive at 85%  [conditional]
 8b. OutputSchemaRepairMiddleware        # JSON extraction/repair before orchestrator
 9.  SubagentResultTruncationMiddleware  # head/tail truncation of final output  [conditional]

The always-on project layers are slots −1, −0.5, 0, 1, 2, 2a, 2a-bis, 2b, 2c, 5, 5b, 6, 6b, 6c, 8b, and 9 (the last is gated only on agent_name being set). The conditional layers are the four SDK middlewares that depend on package availability (retry, fallback, context-editing, summarization, plus the tool-selector and PII detectors), and the two project layers that depend on configuration (DynamicToolSuppression and IntermediateProgress when a progress_fn is wired).

Slot 2a-bis — the `write_todos` normalizer

WriteTodosInputNormalizerMiddleware is appended at slot 2a-bis — after ToolInvocationTimeoutMiddleware (2a) and before the SDK PIIMiddleware (2b-1) (src/langchain/research/middleware/__init__.py, slots 307/324). It is an always-on awrap_tool_call layer scoped to the SDK write_todos tool, and it applies two deterministic normalizations:

WRITE_TODOS_COERCION — when the model serializes the whole todos argument as a JSON string, it coerces that string back into a list before the tool validates it against WriteTodosInput.todos: list[Todo]. Without this, Pydantic rejects the call with a list_type error and a tool slot is wasted. Prompt hints reduce but cannot eliminate this; the boundary coercion does.
WRITE_TODOS_ERROR_GUARD — when the most recent tool call in the turn errored, it downgrades any completed todo back to in_progress, so the agent does not mark work done on the back of a failed step.

It never raises — un-parseable input is left untouched so the real error still surfaces. Because the stack is built per-agent and the SDK does not propagate parent middleware to subagents, BaseDeepAgent re-injects the same normalizer into every subagent’s hand-crafted middleware list at a single choke point (_ensure_subagent_write_todos_normalizer), keeping both paths in sync.

Hook firing order

abefore_agent hooks fire in forward stack order (telemetry first, then context injection). aafter_agent hooks fire in reverse order — 9 → 8b → 6c → −0.5 → −1 — so AgentTelemetryMiddleware (slot −1) reads every sibling’s final counters last and emits an accurate record. AnthropicPromptCachingMiddleware (slot 2d) is deliberately not added by the project: the SDK appends exactly one instance to the agent’s tail unconditionally, and a second instance trips langchain’s middleware-name dedup assertion.

Budget: phase envelopes and a single authority

The budget system replaced an older “everyone competes for a shrinking remaining pool” model. Today every pipeline phase gets a guaranteed wall-clock envelope, and a single BudgetAuthority (src/utils/budget_authority/authority.py) manages the scope hierarchy, surplus reallocation, and degradation.

Phase envelopes

Envelopes are declared in PHASE_ENVELOPES (src/config/phase_envelope_config.py) as frozen PhaseEnvelopeConfig dataclasses. The headline analysis phases:

Phase	Envelope (s)	Protected	Can borrow
`phase1_research`	520	no	no (runs first)
`phase1b_dependent`	365	no	yes
`exit_strategy`	580	yes	yes
`synthesis`	230	yes	yes
`critique`	150	yes	yes
`coherence`	100	no	yes
`final_editor`	100	yes	yes

Key properties:

An agent’s budget is min(section_static_cap, phase_scope_remaining) — no proportion math, no cross-phase competition.
Protected phases never give up surplus. is_protected=True on synthesis, critique, exit_strategy, and final_editor means their unspent time is never released to the surplus pool — the highest-value phases are guaranteed their envelope.
Surplus flows forward. When a phase finishes early, reallocate_surplus() releases its remaining time to a pool that later phases draw from, weighted by surplus_priority_weight (synthesis carries the highest weight, 0.45).
The envelope sum may exceed the workflow ceiling on purpose. The sum of per-phase envelopes intentionally over-allocates by a small margin; the binding runtime cap is the workflow’s actual remaining wall-clock (BudgetAuthority.total_budget_secs), never the envelope sum. BudgetRegistry Rule 11 validates the analysis-only envelope sum against the baseline at startup.
Ingestion runs on its own axis. doc_processing and pre_phase1_routing carry is_ingestion_phase=True, so they are excluded from the analysis ceiling (Rule 11) and governed by their own sub-grant invariant (Rule 11b).

Degradation modes

When budget thins, the workflow scope degrades through an ordered enum (DegradationMode, authority.py):

Mode	Meaning
`normal`	Full operation; all research and tool paths enabled.
`conserve`	Advisory throttle for non-protected phases; reduces tool availability as reserve headroom thins. Protected phases never enter this mode.
`synthesis_only`	Remaining budget is insufficient for new research; skip straight to synthesis / finalization.
`terminate`	Budget exhausted; the workflow must halt.

The enum subclasses str, so ==, JSON serialization, and in {...} checks all work, and str(mode) renders "normal" (not "DegradationMode.NORMAL") to preserve log parity.

Soft vs absolute call limits

Within a phase, each agent is also bounded by call counts in BudgetEnforcementMiddleware (slot 1), configured from DeepAgentBudgetConfig:

Soft limit is advisory. The global ceiling MAX_LLM_CALLS_SOFT_LIMIT is 12 (src/langchain/research/config.py); each domain agent sets a tighter base (Financial 7, Market / Risk / Comparables 5, Construction 6, Exit Strategy 4, Property 3). A time-based override can let an agent past its soft limit when it still has meaningful budget — the override engages only when more than the soft_call_limit_override_threshold_secs window (12 s) remains.
Absolute limit is the hard cap and is non-overridable. It is derived as soft_call_limit + absolute_call_limit_extra_calls, where the extra-calls value is 3 (src/config/deep_agent_budget_config.py). When an agent crosses it, termination is forced.

Budget enforcement on the live path

The static call/time limits above are not sufficient on their own — an agent can sit inside a single long LLM call while the surrounding phase drains. So BaseDeepAgent invokes each agent through _invoke_with_phase_poll(), which drives agent.astream(stream_mode="values") from a background task and interleaves cancellation checks. Any trigger surfaces as asyncio.TimeoutError, so the existing per-agent timeout handlers (which write fallback content and clean up state) catch them uniformly.

The poll enforces several cancellation paths cooperatively:

Static deadline — the agent’s individual envelope cap.
Chunk-inactivity hang detector — if no astream chunk arrives for 90 s, the agent is treated as hung and cancelled. A deep agent emits a chunk per LangGraph superstep, so a long silence means it is not making progress.
Phase-budget poll — when the surrounding phase has less runway than the completion buffer demands, cancel early so downstream protected phases keep their slack.
Per-agent and per-phase envelope kills — when an intelligence agent exceeds its telemetry envelope, or a phase scope exceeds its cap, by more than the 110% overrun threshold, the agent is cancelled and the scope’s spent_secs is stamped so cascade attribution sees the real consumer.
Final-editor runway protection — the synthesis tail (synthesis + critique + final_editor) is protected by polling get_post_reservation_remaining_secs("final_editor") against MIN_RETRY_BUDGET_SECS.

The downstream reservation ladder

MIN_RETRY_BUDGET_SECS is the runway a research or exit agent must leave for the synthesis tail before self-cancelling. It is 280 s, composed from explicit named per-phase floors so the cooperative gate enforces the tail per-phase:

MIN_RETRY_BUDGET_SECS = FINAL_EDITOR(135) + SYNTHESIS(115) + CRITIQUE(30) = 280

The image-intelligence analysis node is protected separately, at the grant layer, not by this tail. IMAGE_INTELLIGENCE_RESERVATION_SECS (45 s) is carved out of the existing reserve for the image phase only, via BudgetAuthority._effective_reserve_target_for_scope(), so the image node opens with grantable time instead of ≈0. A max(FINAL_EDITOR_RESERVATION_SECS, …) clamp at the carve site guarantees the carve can never drop the reserve below final_editor’s floor.

Cooperative cancel must thread the coordinator

One non-obvious invariant from production: when an operation that retries needs cooperative cancellation, checking the cancel probe only at the top of the retry loop is not enough if a RetryCoordinator is wired in. The coordinator owns its own max_retries + backoff loop internally, so a probe before execute() only catches a job cancelled before the first attempt — a job cancelled mid-retry runs the full loop and silently ignores the cancel. The fix is to thread the probe through the coordinator’s predicate hook (which fires after each caught retryable exception), and treat a predicate_rejected result as a cooperative cancel, not an operation failure. This is a silent-failure mode: the cancellation looks wired but does not take effect on the path that matters.

How the layers fit together

For a single section research pass:

The orchestrator opens the phase scope on BudgetAuthority and routes to the domain subgraph node (typed Runtime[WorkflowContext]).
The subgraph’s BaseDeepAgent checks the circuit breaker, sets up the ToolExecutionContext, builds the middleware stack via build_research_middleware(), and creates the agent through create_deep_agent().
_invoke_with_phase_poll() streams the agent with stream_mode="values", enforcing the static, inactivity, phase-budget, envelope, and final-editor-runway cancellation paths.
Middleware enforces soft/absolute call limits (slot 1), suppresses tools under budget pressure (1b), compresses/edits/summarizes context as capacity climbs (6/7/8), and strips hallucinated citations + repairs JSON before return (6c/8b).
The agent returns a structured result; the subgraph writes section content to the Store (manifest metadata to state) and the scope’s surplus reallocates forward.

The result is a workflow that guarantees each phase its time, protects the synthesis tail and image node by reallocation rather than growth, and degrades through named modes — instead of one slow agent silently starving the memo it is supposed to finish.

Sources

src/langchain/research/middleware/__init__.py — build_research_middleware(); the full middleware stack, slot ordering, and the slot 2a-bis WriteTodosInputNormalizerMiddleware (slots 307/324).
src/langchain/research/middleware/write_todos_normalizer_middleware.py — the WRITE_TODOS_COERCION + WRITE_TODOS_ERROR_GUARD normalizations.
src/langchain/research/base_deep_agent.py — BaseDeepAgent: LLM client, circuit-breaker gate, ToolExecutionContext lifecycle, memory/findings/playbook wiring, _invoke_with_phase_poll() cancellation paths, subagent normalizer re-injection.
src/langchain/workflows/state/deal_analysis_state.py — DealAnalysisState multiple inheritance and _auto_initialize_from_annotations().
src/langchain/workflows/state/coordination_state.py, agent_result_state.py, telemetry_state.py, output_state.py — the four sub-TypedDicts.
src/langchain/workflows/state/workflow_context.py — WorkflowContext frozen dataclass + serialization guards.
src/langchain/workflows/orchestrators/workflow/graph_builder.py — StateGraph(state_schema=..., context_schema=..., version="v2") and the idle_timeout / streaming rationale.
src/config/phase_envelope_config.py — PHASE_ENVELOPES, MIN_RETRY_BUDGET_SECS decomposition, IMAGE_INTELLIGENCE_RESERVATION_SECS, CRITIQUE_/SYNTHESIS_RESERVATION_SECS.
src/utils/budget_authority/authority.py — BudgetAuthority, DegradationMode, reserve target, surplus reallocation, 110% overrun threshold.
src/config/deep_agent_budget_config.py — absolute_call_limit_extra_calls, soft-limit override threshold.
src/langchain/research/config.py — MAX_LLM_CALLS_SOFT_LIMIT = 12.
src/langchain/research/middleware/budget_middleware.py + budget/limits.py — soft/absolute effective limit derivation.
.claude/rules/20-patterns.md — LangGraph node-signature, state-schema, and budget patterns.
Native memory: deepagents_sdk_patterns.md (stack order, SDK pins, deliberately-not-adopted list), budget_downstream_reservations.md (reservation ladder, Rule 15, reallocate-not-grow), cooperative_cancel_coordinator_predicate.md (predicate-hook cancel invariant).