Skip to content

Canvas Architecture

Canvas is the FastAPI application analysts edit memos in. This page is for engineers working inside Canvas: how it boots, how its ~110 services are wired, how it becomes multi-worker-safe, and why intelligence is consumed two different ways depending on whether you are in Canvas or in the worker.

The app is a factory (create_app in src/canvas/app.py) with a lifespan context manager. The hard rule is that no startup or shutdown logic lives in app.py — every initialization step is a numbered phase in src/canvas/di/startup_phases.py. To add a service you add it to a phase, not to the lifespan.

Canvas boots through a fixed sequence of phases. Each phase receives the services it needs as parameters and sets its results directly on app.state. Ordering is not cosmetic — a later phase depends on the wiring an earlier phase did (Phase 3’s WebSocketManager needs Phase 2’s DealAccessService; Phase 6’s intel layer needs Phase 5’s DealIntelligenceContextService). The phases are:

PhaseNameWhat it wiresFailure mode
-1MonitoringLangSmith tracing guard (init_monitoring)non-fatal
0Redis + authasync Redis client, JWT service, rate limiter, pub/sub bridge objecthard-fail if Redis is unavailable
1Deal storeDealStoreService resolved from shared infra DI; pre-warms the Postgres poolhard-fail — no memo store, no Canvas
2IdentityPostgres-backed UserService, OrganizationService, DealAccessService, TokenService, AuthProviderRegistry, feature gatesidentity init must succeed
3Canvas coreCanvasService, SessionManager, WebSocketManager, edit locks, collab authority (Phases C1/C2/C4)DealQueueService + CanvasService hard-fail; rest degrade
4Feedbacksync Redis client, feedback memory, active-learning service, classification resolver, discourse publisherall return None-tolerant
4.5Pinecone metric guardvalidates every index’s metric/dimension before retrieval wiringmetric check fails open; dimension mismatch fails fast
4.6Source-type auditper-namespace source_type=null audit for the active deal (E4)best-effort, fail-open
5AI serviceschat, brushes, deep brushes, metric-ripple engine, hero-image enhancer, FormulaGraph cacheall non-fatal
6Pipeline servicessuggestions, notifications, admin, PDF/IP export, then Phase 6b learning + intel layerall non-fatal
7Doc-processing warm-upeagerly warms SaT subprocess, Voyage reranker/embedder, Pinecone poolnon-fatal, 15s sync budget

The half-numbered phases (4.5, 4.6) and the 6b sub-phase exist because they have a strict ordering constraint but are not full peers — the Pinecone metric guard must run before Phase 5 wires retrieval, so a misconfigured index force-disables hybrid search for the process lifetime rather than 400-ing every query.

Phase 6b — the learning and intel sub-phase

Section titled “Phase 6b — the learning and intel sub-phase”

Phase 6 delegates its second half to init_learning_and_pipeline_extensions() in src/canvas/di/startup_phase_learning.py. This split exists purely to keep each file under the modularization threshold; logically it is still Phase 6. It wires the revision dispatcher, the image services, the 28-module learning pipeline (signal aggregation, expertise, trends, corpus context), the Investor Packet and Gamma services, document readiness — and finally the src/intel/ service layer via _init_intel_services(). Several of these services are constructed earlier in the lifespan and then late-bound here (the IntelService facade isn’t available until _init_intel_services completes, so SuggestionService, ChatService, and SessionManager receive it through setters after the fact).

Canvas does not have one God-factory. Dependency construction is split across five domain-scoped factory modules, and callers import directly from the domain module — there is no re-export shim:

  • factory_core_infra.py — Redis, JWT, rate limiter, pub/sub bridge, Canvas service, WebSocket/lock/event services, collab authority, claims API client.
  • factory_ai_services.py — chat, brushes, deep brushes, suggestion service, metric-ripple engine, readiness/quality scorers, image services.
  • factory_pipeline_services.py — deal queue, finalization, PDF/IP export, corpus, learning, portfolio insights, OMCMS.
  • factory_user_management.py — org/user/deal-access/token services, invites, notifications, admin and superadmin services.
  • factory_helpers.py — shared utilities (e.g. the waitlist bridge).

The discipline that holds this together is dependency injection: factory functions inject every dependency, classes never instantiate their own dependencies, and tests inject fakes. A factory returning None (missing dependency) is a first-class outcome — the calling phase null-checks it and continues.

After every phase runs, the lifespan calls validate_app_state_wiring(). This is a safety net against the single most insidious Canvas bug class: a route reads getattr(request.app.state, "some_service", None), the startup wiring sets app.state.some_servce (typo), and the route silently sees None forever. The validator closes that gap with two tiers:

  • Critical attrs (hard-fail). A small frozen set — async_redis, jwt_service, rate_limiter, deal_store, canvas_service, session_manager, websocket_manager, auth_provider_registry, organization_service, deal_access_service, user_service, token_service (12 in total) — that Canvas cannot serve a single request without. If any is missing or None, startup raises.
  • Expected attrs (warn-only). The union of a _ROUTE_CONSUMED_ATTRS set (every attribute routes read off app.state) and an _INTERNAL_ONLY_ATTRS set (services consumed only during later-phase DI wiring). A name that is never set during startup logs a warning naming the likely mismatch. A legitimately failed optional service passes this tier because the phase set it to None explicitly — so it still satisfies hasattr.

The distinction matters: the warn-only tier catches naming mismatches even for optional services, while the hard-fail tier catches a silently-broken core. The expected set holds on the order of a hundred attributes.

Canvas serves live collaborative editing — multiple analysts editing the same memo, with operational-transform (OT) versioning and presence. WebSocket connections are in-memory per worker process. The moment Canvas runs more than one worker, a broadcast originating on worker A has to reach a client connected to worker B. That is the RedisPubSubBridge’s job: it relays every WebSocket broadcast through Redis pub/sub.

The bridge is wired but gated. In Phase 0 the bridge object is created only when WEB_CONCURRENCY > 1 and REDIS_URL is set; in single-worker mode app_state.pubsub_bridge = None explicitly (no point burning three dedicated Redis connections nobody reads). In Phase 3, once WebSocketManager exists, the bridge is started. The gating logic is symmetric: if the bridge object exists at Phase 3, WEB_CONCURRENCY > 1, so a failed start() is fatal — multi-worker Canvas cannot run with a dead cross-worker relay.

Mechanically the bridge holds three dedicated Redis connections (subscriber, publisher, presence), prevents self-echo by tagging every envelope with a per-process worker_id and string-comparing it on receipt, and tracks presence via Redis sorted sets keyed canvas:presence:{org_id}:{deal_id} with a heartbeat. The OT versioning authority is a separate Lua-backed service (CollabAuthorityService); the bridge only moves broadcasts.

The Intel Service Layer — one surface in Canvas, five services in the worker

Section titled “The Intel Service Layer — one surface in Canvas, five services in the worker”

The src/intel/ package is the unified read surface for intelligence: quality signals, the editorial playbook, document readiness, FormulaGraph provenance, and the cross-deal property graph. It is consumed two structurally different ways, and the deliberate decision is not to unify them behind a shared facade.

In Canvas, IntelService (src/intel/service.py) is a single injection surface reachable at request.app.state.intel_service on every route, or via constructor injection on Canvas services. It is a pure namespace holder — attributes are direct references to the underlying services, with no delegation boilerplate and no enforcement layer of its own (tenancy isolation stays on the underlying services, e.g. SignalsService.query_team). One injection point replaces what would otherwise be a dozen individual service references threaded through Deep Brushes, the suggestion service, Canvas chat, web-chat intake, transform, IP export, the approval panel, playbook_tool, and QualityIntelligenceEnricher.

The facade wires fourteen services: seven required (signals, prediction, intelligence, patterns, candidates, readiness_delta, compliance_gate) and seven optional passthroughs that resolve to None if their upstream init failed (readiness, expertise, style, trend, formula_graph, deal_graph, cross_deal_resolver). _init_intel_services() only constructs the facade when all seven required services are non-None; consumers null-check the optional attributes per call so a degraded boot still functions.

IntelService is Canvas-only by design, and the reason is concrete: two of its required services — readiness_delta (DeltaService) and compliance_gate (ComplianceGate) — depend on DocumentReadinessService, which depends on Canvas’s document-state machine (DRAFT → EDITING → APPROVED). That state machine does not exist outside Canvas. The worker has no documents, no approval transitions, no readiness.

The worker and the API service take the exact narrow slice of intelligence they need, injected directly — never a cross-process facade wrapper. src/di/main.py constructs five concrete services and registers them on the services dict: signals_service, trend_service, style_service, expertise_service, and patterns_service. These are threaded as keyword arguments through DealAnalysisOrchestrator into QualityIntelligenceEnricher, every BaseDeepAgent subclass, and IntakeCoordinator; playbook_tool takes patterns_service directly.

The anti-pattern this avoids is a partial IntelService with Optional[...] readiness fields. A facade whose readiness services are sometimes-None would be two contracts sharing one class — a consumer could not tell “running in the worker” from “wiring bug.” Two clean shapes (full facade in Canvas, narrow direct services in the worker) is the correct decomposition. The two contracts never meet.

Canvas owns the document state machine — the constraint that makes the readiness services Canvas-only and that gates every export. It has exactly three states: DRAFT → EDITING → APPROVED. APPROVED → EDITING (reopen) is the only backward transition; otherwise APPROVED is terminal. The legacy EXPORTED and GENERATING states no longer exist — on session restore, a legacy "exported" value is silently coerced to "approved". Approval is what unlocks the Investor Packet, Canvas PDF, and OMCMS exports. The full lifecycle and what approval unlocks is documented on the approval gate page; the system-wide map is System Overview.

  • src/canvas/di/startup_phases.py — the phase functions (init_monitoring through init_doc_processing_warm), the critical/expected attribute sets, and validate_app_state_wiring()
  • src/canvas/di/startup_phase_learning.py — Phase 6b learning + _init_intel_services() (the intel-layer construction and late-binds)
  • src/canvas/di/factory_core_infra.py, factory_ai_services.py, factory_pipeline_services.py, factory_user_management.py, factory_helpers.py — the five domain DI factories
  • src/intel/service.pyIntelService facade (7 required + 7 optional services; the Canvas-only rationale in its docstring)
  • src/di/main.py — worker/web direct intel-service registration (signals_service, trend_service, style_service, expertise_service, patterns_service) and orchestrator threading
  • src/canvas/app.py — the create_app factory and lifespan that drives the phases
  • memory/canvas_multi_worker_readiness.md — pub/sub bridge topology, presence, collab authority, single-vs-multi-worker history
  • memory/observability_ducklake_migration.md — the WEB_CONCURRENCY=4 production reality and its Postgres-catalog consequence
  • .claude/rules/10-domains.md, .claude/rules/20-patterns.md — Canvas DI, the intelligence-consumption pattern, file routing