Canvas Architecture
Canvas is the FastAPI application analysts edit memos in. This page is for engineers working inside Canvas: how it boots, how its ~110 services are wired, how it becomes multi-worker-safe, and why intelligence is consumed two different ways depending on whether you are in Canvas or in the worker.
The app is a factory (create_app in src/canvas/app.py) with a lifespan context manager. The hard rule is that no startup or shutdown logic lives in app.py — every initialization step is a numbered phase in src/canvas/di/startup_phases.py. To add a service you add it to a phase, not to the lifespan.
The phase-ordered startup
Section titled “The phase-ordered startup”Canvas boots through a fixed sequence of phases. Each phase receives the services it needs as parameters and sets its results directly on app.state. Ordering is not cosmetic — a later phase depends on the wiring an earlier phase did (Phase 3’s WebSocketManager needs Phase 2’s DealAccessService; Phase 6’s intel layer needs Phase 5’s DealIntelligenceContextService). The phases are:
| Phase | Name | What it wires | Failure mode |
|---|---|---|---|
| -1 | Monitoring | LangSmith tracing guard (init_monitoring) | non-fatal |
| 0 | Redis + auth | async Redis client, JWT service, rate limiter, pub/sub bridge object | hard-fail if Redis is unavailable |
| 1 | Deal store | DealStoreService resolved from shared infra DI; pre-warms the Postgres pool | hard-fail — no memo store, no Canvas |
| 2 | Identity | Postgres-backed UserService, OrganizationService, DealAccessService, TokenService, AuthProviderRegistry, feature gates | identity init must succeed |
| 3 | Canvas core | CanvasService, SessionManager, WebSocketManager, edit locks, collab authority (Phases C1/C2/C4) | DealQueueService + CanvasService hard-fail; rest degrade |
| 4 | Feedback | sync Redis client, feedback memory, active-learning service, classification resolver, discourse publisher | all return None-tolerant |
| 4.5 | Pinecone metric guard | validates every index’s metric/dimension before retrieval wiring | metric check fails open; dimension mismatch fails fast |
| 4.6 | Source-type audit | per-namespace source_type=null audit for the active deal (E4) | best-effort, fail-open |
| 5 | AI services | chat, brushes, deep brushes, metric-ripple engine, hero-image enhancer, FormulaGraph cache | all non-fatal |
| 6 | Pipeline services | suggestions, notifications, admin, PDF/IP export, then Phase 6b learning + intel layer | all non-fatal |
| 7 | Doc-processing warm-up | eagerly warms SaT subprocess, Voyage reranker/embedder, Pinecone pool | non-fatal, 15s sync budget |
The half-numbered phases (4.5, 4.6) and the 6b sub-phase exist because they have a strict ordering constraint but are not full peers — the Pinecone metric guard must run before Phase 5 wires retrieval, so a misconfigured index force-disables hybrid search for the process lifetime rather than 400-ing every query.
Phase 6b — the learning and intel sub-phase
Section titled “Phase 6b — the learning and intel sub-phase”Phase 6 delegates its second half to init_learning_and_pipeline_extensions() in src/canvas/di/startup_phase_learning.py. This split exists purely to keep each file under the modularization threshold; logically it is still Phase 6. It wires the revision dispatcher, the image services, the 28-module learning pipeline (signal aggregation, expertise, trends, corpus context), the Investor Packet and Gamma services, document readiness — and finally the src/intel/ service layer via _init_intel_services(). Several of these services are constructed earlier in the lifespan and then late-bound here (the IntelService facade isn’t available until _init_intel_services completes, so SuggestionService, ChatService, and SessionManager receive it through setters after the fact).
The five DI factory modules
Section titled “The five DI factory modules”Canvas does not have one God-factory. Dependency construction is split across five domain-scoped factory modules, and callers import directly from the domain module — there is no re-export shim:
factory_core_infra.py— Redis, JWT, rate limiter, pub/sub bridge, Canvas service, WebSocket/lock/event services, collab authority, claims API client.factory_ai_services.py— chat, brushes, deep brushes, suggestion service, metric-ripple engine, readiness/quality scorers, image services.factory_pipeline_services.py— deal queue, finalization, PDF/IP export, corpus, learning, portfolio insights, OMCMS.factory_user_management.py— org/user/deal-access/token services, invites, notifications, admin and superadmin services.factory_helpers.py— shared utilities (e.g. the waitlist bridge).
The discipline that holds this together is dependency injection: factory functions inject every dependency, classes never instantiate their own dependencies, and tests inject fakes. A factory returning None (missing dependency) is a first-class outcome — the calling phase null-checks it and continues.
The wiring-validation pass
Section titled “The wiring-validation pass”After every phase runs, the lifespan calls validate_app_state_wiring(). This is a safety net against the single most insidious Canvas bug class: a route reads getattr(request.app.state, "some_service", None), the startup wiring sets app.state.some_servce (typo), and the route silently sees None forever. The validator closes that gap with two tiers:
- Critical attrs (hard-fail). A small frozen set —
async_redis,jwt_service,rate_limiter,deal_store,canvas_service,session_manager,websocket_manager,auth_provider_registry,organization_service,deal_access_service,user_service,token_service(12 in total) — that Canvas cannot serve a single request without. If any is missing orNone, startup raises. - Expected attrs (warn-only). The union of a
_ROUTE_CONSUMED_ATTRSset (every attribute routes read offapp.state) and an_INTERNAL_ONLY_ATTRSset (services consumed only during later-phase DI wiring). A name that is never set during startup logs a warning naming the likely mismatch. A legitimately failed optional service passes this tier because the phase set it toNoneexplicitly — so it still satisfieshasattr.
The distinction matters: the warn-only tier catches naming mismatches even for optional services, while the hard-fail tier catches a silently-broken core. The expected set holds on the order of a hundred attributes.
The multi-worker Redis pub/sub bridge
Section titled “The multi-worker Redis pub/sub bridge”Canvas serves live collaborative editing — multiple analysts editing the same memo, with operational-transform (OT) versioning and presence. WebSocket connections are in-memory per worker process. The moment Canvas runs more than one worker, a broadcast originating on worker A has to reach a client connected to worker B. That is the RedisPubSubBridge’s job: it relays every WebSocket broadcast through Redis pub/sub.
The bridge is wired but gated. In Phase 0 the bridge object is created only when WEB_CONCURRENCY > 1 and REDIS_URL is set; in single-worker mode app_state.pubsub_bridge = None explicitly (no point burning three dedicated Redis connections nobody reads). In Phase 3, once WebSocketManager exists, the bridge is started. The gating logic is symmetric: if the bridge object exists at Phase 3, WEB_CONCURRENCY > 1, so a failed start() is fatal — multi-worker Canvas cannot run with a dead cross-worker relay.
Mechanically the bridge holds three dedicated Redis connections (subscriber, publisher, presence), prevents self-echo by tagging every envelope with a per-process worker_id and string-comparing it on receipt, and tracks presence via Redis sorted sets keyed canvas:presence:{org_id}:{deal_id} with a heartbeat. The OT versioning authority is a separate Lua-backed service (CollabAuthorityService); the bridge only moves broadcasts.
The Intel Service Layer — one surface in Canvas, five services in the worker
Section titled “The Intel Service Layer — one surface in Canvas, five services in the worker”The src/intel/ package is the unified read surface for intelligence: quality signals, the editorial playbook, document readiness, FormulaGraph provenance, and the cross-deal property graph. It is consumed two structurally different ways, and the deliberate decision is not to unify them behind a shared facade.
Canvas: the IntelService facade
Section titled “Canvas: the IntelService facade”In Canvas, IntelService (src/intel/service.py) is a single injection surface reachable at request.app.state.intel_service on every route, or via constructor injection on Canvas services. It is a pure namespace holder — attributes are direct references to the underlying services, with no delegation boilerplate and no enforcement layer of its own (tenancy isolation stays on the underlying services, e.g. SignalsService.query_team). One injection point replaces what would otherwise be a dozen individual service references threaded through Deep Brushes, the suggestion service, Canvas chat, web-chat intake, transform, IP export, the approval panel, playbook_tool, and QualityIntelligenceEnricher.
The facade wires fourteen services: seven required (signals, prediction, intelligence, patterns, candidates, readiness_delta, compliance_gate) and seven optional passthroughs that resolve to None if their upstream init failed (readiness, expertise, style, trend, formula_graph, deal_graph, cross_deal_resolver). _init_intel_services() only constructs the facade when all seven required services are non-None; consumers null-check the optional attributes per call so a degraded boot still functions.
Why the facade is Canvas-only
Section titled “Why the facade is Canvas-only”IntelService is Canvas-only by design, and the reason is concrete: two of its required services — readiness_delta (DeltaService) and compliance_gate (ComplianceGate) — depend on DocumentReadinessService, which depends on Canvas’s document-state machine (DRAFT → EDITING → APPROVED). That state machine does not exist outside Canvas. The worker has no documents, no approval transitions, no readiness.
Worker/web: five direct services
Section titled “Worker/web: five direct services”The worker and the API service take the exact narrow slice of intelligence they need, injected directly — never a cross-process facade wrapper. src/di/main.py constructs five concrete services and registers them on the services dict: signals_service, trend_service, style_service, expertise_service, and patterns_service. These are threaded as keyword arguments through DealAnalysisOrchestrator into QualityIntelligenceEnricher, every BaseDeepAgent subclass, and IntakeCoordinator; playbook_tool takes patterns_service directly.
The anti-pattern this avoids is a partial IntelService with Optional[...] readiness fields. A facade whose readiness services are sometimes-None would be two contracts sharing one class — a consumer could not tell “running in the worker” from “wiring bug.” Two clean shapes (full facade in Canvas, narrow direct services in the worker) is the correct decomposition. The two contracts never meet.
The document lifecycle Canvas enforces
Section titled “The document lifecycle Canvas enforces”Canvas owns the document state machine — the constraint that makes the readiness services Canvas-only and that gates every export. It has exactly three states: DRAFT → EDITING → APPROVED. APPROVED → EDITING (reopen) is the only backward transition; otherwise APPROVED is terminal. The legacy EXPORTED and GENERATING states no longer exist — on session restore, a legacy "exported" value is silently coerced to "approved". Approval is what unlocks the Investor Packet, Canvas PDF, and OMCMS exports. The full lifecycle and what approval unlocks is documented on the approval gate page; the system-wide map is System Overview.
Sources
Section titled “Sources”src/canvas/di/startup_phases.py— the phase functions (init_monitoringthroughinit_doc_processing_warm), the critical/expected attribute sets, andvalidate_app_state_wiring()src/canvas/di/startup_phase_learning.py— Phase 6b learning +_init_intel_services()(the intel-layer construction and late-binds)src/canvas/di/factory_core_infra.py,factory_ai_services.py,factory_pipeline_services.py,factory_user_management.py,factory_helpers.py— the five domain DI factoriessrc/intel/service.py—IntelServicefacade (7 required + 7 optional services; the Canvas-only rationale in its docstring)src/di/main.py— worker/web direct intel-service registration (signals_service,trend_service,style_service,expertise_service,patterns_service) and orchestrator threadingsrc/canvas/app.py— thecreate_appfactory and lifespan that drives the phasesmemory/canvas_multi_worker_readiness.md— pub/sub bridge topology, presence, collab authority, single-vs-multi-worker historymemory/observability_ducklake_migration.md— theWEB_CONCURRENCY=4production reality and its Postgres-catalog consequence.claude/rules/10-domains.md,.claude/rules/20-patterns.md— Canvas DI, the intelligence-consumption pattern, file routing