LangGraph, Deep Agents, and Budget
A Memosa deal analysis is a single LangGraph workflow. The orchestrator graph fans
out to domain subgraphs (financial, risk, market, property, comparables, exit
strategy), each of which drives a deep agent — an LLM agent built on the
deepagents SDK and wrapped in a
fixed stack of project middleware. Above the agents sits a phase-envelope budget
system: each pipeline phase gets a guaranteed time envelope, surplus reallocates
to later phases, and the whole workflow degrades through named modes rather than
hanging when wall-clock runs short.
This page documents the four layers an engineer needs to reason about: the
LangGraph node/runtime contract, the decomposed DealAnalysisState, the
non-negotiable middleware stack, and BudgetAuthority. Every constant cited here
is anchored to the file it lives in.
LangGraph v1.0.9 node and runtime contract
Section titled “LangGraph v1.0.9 node and runtime contract”Every node in the workflow is an async def that takes the state and a typed
runtime, and returns a partial-state dict. The runtime — not a RunnableConfig
— is how a node reaches injected services and the thread identity.
from langgraph.runtime import Runtimefrom src.langchain.workflows.state.workflow_context import ( WorkflowContext, get_thread_id,)
async def my_node(state: DealAnalysisState, runtime: Runtime[WorkflowContext]) -> dict: thread_id = get_thread_id(runtime) deal_store = runtime.context.deal_store budget_authority = runtime.context.budget_authority return {"some_state_field": ...}WorkflowContext is a frozen dataclass (src/langchain/workflows/state/workflow_context.py).
It carries the required thread_id, plus optional deal_id, namespace,
deal_store, budget_authority, progress callbacks, and the cross-subgraph
source_id_allocator that keeps [SRC:n] markers globally unique. Live service
objects (deal_store, budget_authority, callbacks) are stripped from
__getstate__ so they never break a LangSmith trace pickle.
The graph itself is built with both schemas declared:
workflow = StateGraph( state_schema=DealAnalysisState, context_schema=WorkflowContext, version="v2",)Streaming uses stream_mode="updates" at the workflow boundary (never
astream_values() / .astream_values(). Inside a deep agent, invocation streams
stream_mode="values" so the orchestrator gets a full-state chunk after each
LangGraph superstep — this is what powers the per-step heartbeat and hang detector
described under Budget enforcement on the live path.
State: one flat TypedDict, four focused parts
Section titled “State: one flat TypedDict, four focused parts”DealAnalysisState looks flat to LangGraph but is composed from four sub-TypedDicts
via multiple inheritance (src/langchain/workflows/state/deal_analysis_state.py):
class DealAnalysisState( CoordinationState, # routing, stage tracking, execution counters/flags AgentResultState, # per-subgraph outputs (risks, comparables, synthesis inputs) TelemetryState, # metrics, diagnostics, quality scores, debug fields OutputState, # final deliverable artifacts (memo, markdown, footnotes)): ...| Sub-schema | Concern | Fields |
|---|---|---|
CoordinationState | Routing, stages, execution counters/flags | 44 |
AgentResultState | Per-subgraph outputs from domain agents | 60 |
TelemetryState | Metrics, diagnostics, quality scores, debug | 30 |
OutputState | Final deliverable artifacts | 12 |
LangGraph sees the merged flat TypedDict, so there is zero runtime cost to the decomposition — it is purely an organizational and safety boundary.
Two state disciplines are load-bearing and enforced project-wide:
- Accumulating lists use a bounded reducer. Any
Annotated[List[...], ...]field that grows over the run usesring_buffer_append(N)(e.g.agent_discoursecaps at 50), never alambda x, y: x + y. The ring buffer treatsNoneas a no-op and[]as an explicit clear, and prevents unbounded checkpoint growth. - Large artifacts live in the Store, not state. Memo section content lives in
PGStore (
deal_store.set_memo_section); state carries only the metadata-onlymemo_sections_manifest. State is coordination; the Store is the system of record for deliverable content.
Deep agents: one base class, no duplication
Section titled “Deep agents: one base class, no duplication”Nine deep agents drive the analysis — Financial, Risk, Market, Property, Exit
Strategy, Comparables, Synthesis, Coherence, and Critique. Every one of them is
constructed through create_deep_agent() from the SDK, and every one inherits
BaseDeepAgent (src/langchain/research/base_deep_agent.py).
BaseDeepAgent owns the shared runtime concerns so no subgraph re-implements them:
- LLM client lazy creation via the injected
ProductionLLMFactory(Anthropic); no directChatOpenAI/ChatAnthropicinstantiation anywhere in agent code. - Circuit-breaker gate — before invoking, it checks the global Pinecone circuit breaker and either waits for reset (if there is budget to wait) or fast-fails.
ToolExecutionContextlifecycle —_setup_tool_context()/_teardown_tool_context()bracket execution, wiring the namespace, theCrossSectionChunkRegistry(which deduplicates identical Pinecone queries across parallel section agents), and the deal-scoped backend.- Memory, findings, and playbook tool wiring —
read_evicted_file,record_finding/query_agent_findings,record_strategy/recall_strategy,retrieve_playbook(team-scoped, via the injectedPatternsService), andread_section(read a completed sibling section instead of re-deriving it). - Budget-aware invocation — resets middleware budget to actual remaining time, then streams the agent with the phase-aware poll.
Subclasses implement only the domain-specific hooks: _build_tools(),
_build_subagents(), _build_system_prompt(), _populate_skills(), and
_parse_agent_result().
All dependencies are constructor-injected (retrieval_engine,
timeout_budget_manager, llm_factory, deal_store, patterns_service,
cross_section_cache, and the create_deep_agent_fn itself). Nothing is
instantiated internally — see Dependency Injection.
The middleware stack (order is non-negotiable)
Section titled “The middleware stack (order is non-negotiable)”When BaseDeepAgent._create_agent() builds an agent, it composes the middleware
list via build_research_middleware()
(src/langchain/research/middleware/__init__.py). The result is 16 always-on
project middlewares plus up to 6 conditional layers (four SDK + two project),
assembled in a single fixed order.
-1. AgentTelemetryMiddleware # wall-time + counter aggregation-0.5 BudgetSurplusReleaseMiddleware # return unspent budget to surplus pool 0. DealContextInjectionMiddleware # namespace validation + preamble injection 1. BudgetEnforcementMiddleware # hard budget gate (soft + absolute limits) 1b. DynamicToolSuppressionMiddleware # proactive tool filtering on low budget 1c. LLMToolSelectorMiddleware (SDK) # relevance-based tool prefilter [conditional] 2. ToolCallLimitMiddleware # per-tool call count limits 2a. ToolInvocationTimeoutMiddleware # per-tool wall-time cap + elapsed telemetry 2a-bis. WriteTodosInputNormalizerMiddleware # coerce JSON-string `todos` → list 2b-1. PIIMiddleware × N (SDK) # built-in PII detectors [conditional] 2b. ComplianceFilterMiddleware # credential-leak heuristics 2c. QueryDiversityEnforcementMiddleware # active diversity steering 2d. AnthropicPromptCachingMiddleware # appended by the SDK tail, NOT here 3. ModelRetryMiddleware (SDK) # retry transient LLM errors [conditional] 4. ModelFallbackMiddleware (SDK) # Sonnet → Haiku cascade [conditional] 5. ToolRetryMiddleware # exponential backoff for vector retrieval 5b. ToolResultEnrichmentMiddleware # metadata footer on search results 6. ContextCompressionMiddleware # proactive compression at 60% capacity 6b. ParseQualityGateMiddleware # stuck-agent detection 6d. IntermediateProgressMiddleware # cycle events for long runs [conditional] 6c. CitationIntegrityMiddleware # strip hallucinated [SRC:n] 7. ContextEditingMiddleware (SDK) # clear tool INPUTS at 50% [conditional] 8. SummarizationMiddleware (SDK) # full history archive at 85% [conditional] 8b. OutputSchemaRepairMiddleware # JSON extraction/repair before orchestrator 9. SubagentResultTruncationMiddleware # head/tail truncation of final output [conditional]The always-on project layers are slots −1, −0.5, 0, 1, 2, 2a, 2a-bis, 2b, 2c,
5, 5b, 6, 6b, 6c, 8b, and 9 (the last is gated only on agent_name being set). The
conditional layers are the four SDK middlewares that depend on package availability
(retry, fallback, context-editing, summarization, plus the tool-selector and PII
detectors), and the two project layers that depend on configuration
(DynamicToolSuppression and IntermediateProgress when a progress_fn is wired).
Slot 2a-bis — the write_todos normalizer
Section titled “Slot 2a-bis — the write_todos normalizer”WriteTodosInputNormalizerMiddleware is appended at slot 2a-bis — after
ToolInvocationTimeoutMiddleware (2a) and before the SDK PIIMiddleware (2b-1)
(src/langchain/research/middleware/__init__.py, slots 307/324). It is an always-on
awrap_tool_call layer scoped to the SDK write_todos tool, and it applies two
deterministic normalizations:
WRITE_TODOS_COERCION— when the model serializes the wholetodosargument as a JSON string, it coerces that string back into a list before the tool validates it againstWriteTodosInput.todos: list[Todo]. Without this, Pydantic rejects the call with alist_typeerror and a tool slot is wasted. Prompt hints reduce but cannot eliminate this; the boundary coercion does.WRITE_TODOS_ERROR_GUARD— when the most recent tool call in the turn errored, it downgrades anycompletedtodo back toin_progress, so the agent does not mark work done on the back of a failed step.
It never raises — un-parseable input is left untouched so the real error still
surfaces. Because the stack is built per-agent and the SDK does not propagate parent
middleware to subagents, BaseDeepAgent re-injects the same normalizer into every
subagent’s hand-crafted middleware list at a single choke point
(_ensure_subagent_write_todos_normalizer), keeping both paths in sync.
Hook firing order
Section titled “Hook firing order”abefore_agent hooks fire in forward stack order (telemetry first, then context
injection). aafter_agent hooks fire in reverse order — 9 → 8b → 6c → −0.5 → −1 — so AgentTelemetryMiddleware (slot −1) reads every sibling’s final counters
last and emits an accurate record. AnthropicPromptCachingMiddleware (slot 2d) is
deliberately not added by the project: the SDK appends exactly one instance to
the agent’s tail unconditionally, and a second instance trips langchain’s
middleware-name dedup assertion.
Budget: phase envelopes and a single authority
Section titled “Budget: phase envelopes and a single authority”The budget system replaced an older “everyone competes for a shrinking remaining
pool” model. Today every pipeline phase gets a guaranteed wall-clock envelope,
and a single BudgetAuthority (src/utils/budget_authority/authority.py) manages
the scope hierarchy, surplus reallocation, and degradation.
Phase envelopes
Section titled “Phase envelopes”Envelopes are declared in PHASE_ENVELOPES
(src/config/phase_envelope_config.py) as frozen PhaseEnvelopeConfig dataclasses.
The headline analysis phases:
| Phase | Envelope (s) | Protected | Can borrow |
|---|---|---|---|
phase1_research | 520 | no | no (runs first) |
phase1b_dependent | 365 | no | yes |
exit_strategy | 580 | yes | yes |
synthesis | 230 | yes | yes |
critique | 150 | yes | yes |
coherence | 100 | no | yes |
final_editor | 100 | yes | yes |
Key properties:
- An agent’s budget is
min(section_static_cap, phase_scope_remaining)— no proportion math, no cross-phase competition. - Protected phases never give up surplus.
is_protected=Trueonsynthesis,critique,exit_strategy, andfinal_editormeans their unspent time is never released to the surplus pool — the highest-value phases are guaranteed their envelope. - Surplus flows forward. When a phase finishes early,
reallocate_surplus()releases its remaining time to a pool that later phases draw from, weighted bysurplus_priority_weight(synthesis carries the highest weight, 0.45). - The envelope sum may exceed the workflow ceiling on purpose. The sum of
per-phase envelopes intentionally over-allocates by a small margin; the binding
runtime cap is the workflow’s actual remaining wall-clock
(
BudgetAuthority.total_budget_secs), never the envelope sum.BudgetRegistryRule 11 validates the analysis-only envelope sum against the baseline at startup. - Ingestion runs on its own axis.
doc_processingandpre_phase1_routingcarryis_ingestion_phase=True, so they are excluded from the analysis ceiling (Rule 11) and governed by their own sub-grant invariant (Rule 11b).
Degradation modes
Section titled “Degradation modes”When budget thins, the workflow scope degrades through an ordered enum
(DegradationMode, authority.py):
| Mode | Meaning |
|---|---|
normal | Full operation; all research and tool paths enabled. |
conserve | Advisory throttle for non-protected phases; reduces tool availability as reserve headroom thins. Protected phases never enter this mode. |
synthesis_only | Remaining budget is insufficient for new research; skip straight to synthesis / finalization. |
terminate | Budget exhausted; the workflow must halt. |
The enum subclasses str, so ==, JSON serialization, and in {...} checks all
work, and str(mode) renders "normal" (not "DegradationMode.NORMAL") to
preserve log parity.
Soft vs absolute call limits
Section titled “Soft vs absolute call limits”Within a phase, each agent is also bounded by call counts in
BudgetEnforcementMiddleware (slot 1), configured from
DeepAgentBudgetConfig:
- Soft limit is advisory. The global ceiling
MAX_LLM_CALLS_SOFT_LIMITis 12 (src/langchain/research/config.py); each domain agent sets a tighter base (Financial 7, Market / Risk / Comparables 5, Construction 6, Exit Strategy 4, Property 3). A time-based override can let an agent past its soft limit when it still has meaningful budget — the override engages only when more than thesoft_call_limit_override_threshold_secswindow (12 s) remains. - Absolute limit is the hard cap and is non-overridable. It is derived as
soft_call_limit + absolute_call_limit_extra_calls, where the extra-calls value is 3 (src/config/deep_agent_budget_config.py). When an agent crosses it, termination is forced.
Budget enforcement on the live path
Section titled “Budget enforcement on the live path”The static call/time limits above are not sufficient on their own — an agent can
sit inside a single long LLM call while the surrounding phase drains. So
BaseDeepAgent invokes each agent through _invoke_with_phase_poll(), which drives
agent.astream(stream_mode="values") from a background task and interleaves
cancellation checks. Any trigger surfaces as asyncio.TimeoutError, so the existing
per-agent timeout handlers (which write fallback content and clean up state) catch
them uniformly.
The poll enforces several cancellation paths cooperatively:
- Static deadline — the agent’s individual envelope cap.
- Chunk-inactivity hang detector — if no astream chunk arrives for 90 s, the agent is treated as hung and cancelled. A deep agent emits a chunk per LangGraph superstep, so a long silence means it is not making progress.
- Phase-budget poll — when the surrounding phase has less runway than the completion buffer demands, cancel early so downstream protected phases keep their slack.
- Per-agent and per-phase envelope kills — when an intelligence agent exceeds
its telemetry envelope, or a phase scope exceeds its cap, by more than the 110%
overrun threshold, the agent is cancelled and the scope’s
spent_secsis stamped so cascade attribution sees the real consumer. - Final-editor runway protection — the synthesis tail (synthesis + critique +
final_editor) is protected by polling
get_post_reservation_remaining_secs("final_editor")againstMIN_RETRY_BUDGET_SECS.
The downstream reservation ladder
Section titled “The downstream reservation ladder”MIN_RETRY_BUDGET_SECS is the runway a research or exit agent must leave for the
synthesis tail before self-cancelling. It is 280 s, composed from explicit
named per-phase floors so the cooperative gate enforces the tail per-phase:
MIN_RETRY_BUDGET_SECS = FINAL_EDITOR(135) + SYNTHESIS(115) + CRITIQUE(30) = 280The image-intelligence analysis node is protected separately, at the grant
layer, not by this tail. IMAGE_INTELLIGENCE_RESERVATION_SECS (45 s) is
carved out of the existing reserve for the image phase only, via
BudgetAuthority._effective_reserve_target_for_scope(), so the image node opens with
grantable time instead of ≈0. A max(FINAL_EDITOR_RESERVATION_SECS, …) clamp at
the carve site guarantees the carve can never drop the reserve below final_editor’s
floor.
Cooperative cancel must thread the coordinator
Section titled “Cooperative cancel must thread the coordinator”One non-obvious invariant from production: when an operation that retries needs
cooperative cancellation, checking the cancel probe only at the top of the retry
loop is not enough if a RetryCoordinator is wired in. The coordinator owns its own
max_retries + backoff loop internally, so a probe before execute() only catches
a job cancelled before the first attempt — a job cancelled mid-retry runs the full
loop and silently ignores the cancel. The fix is to thread the probe through the
coordinator’s predicate hook (which fires after each caught retryable exception),
and treat a predicate_rejected result as a cooperative cancel, not an operation
failure. This is a silent-failure mode: the cancellation looks wired but does not
take effect on the path that matters.
How the layers fit together
Section titled “How the layers fit together”For a single section research pass:
- The orchestrator opens the phase scope on
BudgetAuthorityand routes to the domain subgraph node (typedRuntime[WorkflowContext]). - The subgraph’s
BaseDeepAgentchecks the circuit breaker, sets up theToolExecutionContext, builds the middleware stack viabuild_research_middleware(), and creates the agent throughcreate_deep_agent(). _invoke_with_phase_poll()streams the agent withstream_mode="values", enforcing the static, inactivity, phase-budget, envelope, and final-editor-runway cancellation paths.- Middleware enforces soft/absolute call limits (slot 1), suppresses tools under budget pressure (1b), compresses/edits/summarizes context as capacity climbs (6/7/8), and strips hallucinated citations + repairs JSON before return (6c/8b).
- The agent returns a structured result; the subgraph writes section content to the Store (manifest metadata to state) and the scope’s surplus reallocates forward.
The result is a workflow that guarantees each phase its time, protects the synthesis tail and image node by reallocation rather than growth, and degrades through named modes — instead of one slow agent silently starving the memo it is supposed to finish.
Sources
Section titled “Sources”src/langchain/research/middleware/__init__.py—build_research_middleware(); the full middleware stack, slot ordering, and the slot 2a-bisWriteTodosInputNormalizerMiddleware(slots 307/324).src/langchain/research/middleware/write_todos_normalizer_middleware.py— theWRITE_TODOS_COERCION+WRITE_TODOS_ERROR_GUARDnormalizations.src/langchain/research/base_deep_agent.py—BaseDeepAgent: LLM client, circuit-breaker gate,ToolExecutionContextlifecycle, memory/findings/playbook wiring,_invoke_with_phase_poll()cancellation paths, subagent normalizer re-injection.src/langchain/workflows/state/deal_analysis_state.py—DealAnalysisStatemultiple inheritance and_auto_initialize_from_annotations().src/langchain/workflows/state/coordination_state.py,agent_result_state.py,telemetry_state.py,output_state.py— the four sub-TypedDicts.src/langchain/workflows/state/workflow_context.py—WorkflowContextfrozen dataclass + serialization guards.src/langchain/workflows/orchestrators/workflow/graph_builder.py—StateGraph(state_schema=..., context_schema=..., version="v2")and the idle_timeout / streaming rationale.src/config/phase_envelope_config.py—PHASE_ENVELOPES,MIN_RETRY_BUDGET_SECSdecomposition,IMAGE_INTELLIGENCE_RESERVATION_SECS,CRITIQUE_/SYNTHESIS_RESERVATION_SECS.src/utils/budget_authority/authority.py—BudgetAuthority,DegradationMode, reserve target, surplus reallocation, 110% overrun threshold.src/config/deep_agent_budget_config.py—absolute_call_limit_extra_calls, soft-limit override threshold.src/langchain/research/config.py—MAX_LLM_CALLS_SOFT_LIMIT = 12.src/langchain/research/middleware/budget_middleware.py+budget/limits.py— soft/absolute effective limit derivation..claude/rules/20-patterns.md— LangGraph node-signature, state-schema, and budget patterns.- Native memory:
deepagents_sdk_patterns.md(stack order, SDK pins, deliberately-not-adopted list),budget_downstream_reservations.md(reservation ladder, Rule 15, reallocate-not-grow),cooperative_cancel_coordinator_predicate.md(predicate-hook cancel invariant).