Observability and Infrastructure

This page covers two things engineers reach for constantly: the analytical observability warehouse that every cross-deal read flows through, and the deployment substrate — the Railway services, the per-service Postgres pool budgets, and the Pinecone connection pool — that the whole system runs on.

The observability warehouse

Memosa has its own self-hosted lakehouse for telemetry: a DuckLake warehouse (DuckDB 1.5.2 engine). The catalog lives on the existing Postgres; the data files live in a native Railway Bucket (S3-compatible — not AWS). For local dev and tests, a local file catalog stands in. This replaced an earlier Parquet-glob design, and the replacement is total: the old ObservabilityReader, the read_parquet glob, and union_by_name are deleted.

Two halves: producers write, consumers read

The write path and the read path are deliberately separate.

Producers call get_observability_writer().append(table, row) from hot paths. append is synchronous, cheap, and NEVER raises — a production hot path can emit telemetry without any risk of breaking on a warehouse problem. The writer buffers rows (default flush at 100 rows or every 30 seconds ) and a single daemon flush thread serializes them into the catalog via insert_arrow. One flush thread is not an accident — serializing writes sidesteps a known multithreaded-Python DuckLake+Postgres deadlock.

Consumers read through ObservabilityLake — a process-wide singleton (get_observability_lake()), one lock-guarded, attach-once connection per process, with heal-on-error. There is also a read_only attach mode (used by the local telemetry MCP analyst surface) that attaches the catalog READ_ONLY and skips schema creation.

The `ObservabilityLake` singleton — why attach-once

The load-bearing design constraint is connection budget. DuckDB’s Postgres extension opens its own connection pool per attached database. The deleted glob reader created a fresh in-memory connection on every query() call (15+ call sites) — which would exhaust the per-service Postgres budgets the moment cross-deal reads got busy. The fix is a single shared attached connection per process, with the catalog’s Postgres pool capped at pg_pool_max_connections = 2 so each service process holds ~2 catalog connections, not dozens.

The lake heals on duckdb.Error: any error disposes the connection and the next call re-attaches (and re-runs CREATE TABLE IF NOT EXISTS for every registered table, so reads are safe before the first write). Tables are month(ts)-partitioned on first appearance, guarded so a partition spec doesn’t churn a fresh snapshot on every attach.

The registry — 15 tables, the source of truth

TABLE_SCHEMAS in src/observability/schemas.py is the registry and the source of truth. It currently holds 15 tables — among them deal_finalizations, retrieval_events, agent_discourse_events, user_activity_events, llm_cost_events, readiness_snapshots, pinecone_upsert_events, formula_graph_events, sql_invariant_findings, conflict_resolution_events, excel_zero_chunks_attribution, semantic_critique_events, and checkpoint_size_events. The count drifts upward as tables are added; _attach iterates the registry dynamically, so a new table is created and partitioned on the next attach without a migration.

The consumer pattern

Product-facing services that need cross-deal / analytical reads use the warehouse, not a fresh round of Postgres aggregation. AuditTrailService is the architectural precedent; about eleven services follow the same shape. The pattern is fixed:

Stateless class, constructor (reader: Optional[ObservabilityLake] = None) defaulting to get_observability_lake() — the process-wide singleton, never a per-request connect.
Parameter-bound SQL via reader.query(sql, params). Table and column names are hardcoded; every external value is bound through a ? placeholder.
A _safe_query wrapper catches (duckdb.Error, Exception) and returns [], logging at DEBUG. Observability reads must NEVER break a route.
ISO-stringify timestamps before returning — DuckDB returns datetime objects; FastAPI needs JSON-serializable output.

Most consumers are route-backed Canvas dashboards (PortfolioInsightsService, UserActivityInsightsService, AuditTrailService, …), but the pattern also runs on hot paths. NamespaceCountService reads SUM(chunks_uploaded) from pinecone_upsert_events as a ground-truth count during source-type recovery, and RetrievalHealthService aggregates retrieval_events per section on the worker’s live memo-generation path. Both are synchronous, shared-singleton, _safe_query-guarded reads — the live-path precedent for treating the warehouse as an active input, not just a dashboard backend.

Anonymity is enforced in Python, not SQL

PortfolioInsightsService surfaces only aggregate statistics — medians, counts, ranges — never an individual deal’s name, ID, or raw metrics. The privacy floor is N ≥ 3 deals before any statistic surfaces, and that guard lives in the Python service layer, not in a SQL HAVING clause. Every aggregation query returns a single row, and the service maps a below-threshold count to None (if deal_count < _MIN_DEALS_FOR_AGGREGATE: return None, repeated across each aggregation method). _MIN_DEALS_FOR_AGGREGATE is 3 .

This is intentional on two counts: it keeps the SQL portable (no version-sensitive HAVING semantics), and it makes the privacy guard explicit and auditable in code rather than buried in a query. The universal base filter underneath every aggregation is finalization_status = 'success' — degraded finalizations are recorded but excluded, because their numeric fields (cap rate, LTV, …) are unreliable.

Schema evolution discipline

Schema changes are native ALTER TABLE obs.<table> ADD COLUMN (run by ObservabilityLake). This is safe because old snapshots surface NULL for the new column, and DuckDB’s MEDIAN and AVG ignore NULL naturally — so a historical row still contributes to COUNT(*) but not to a median, which is the right semantic for aggregate stats. A contract test (tests/unit/observability/test_schema_additivity.py) locks this by doing an ADD COLUMN round-trip and asserting the prior row surfaces NULL.

What is forbidden: removing a column, retyping a column, or reusing a name for a different meaning. If you must change semantics, bump a new table name instead.

The Railway deployment substrate

Memosa runs on Railway across two environments — production (main branch, custom memosa.io domains) and staging (staging branch, Railway-generated URLs). Both run the same code; branch→environment mapping is via deployment triggers.

Application and infrastructure services

Production runs four application services; staging runs a fifth, Evergreen (the nightly autonomous audit worker — production is intentionally asymmetric at four). Plus two Railway-managed infrastructure services.

Dashboard name	`railway.json` key	Role
API	`web`	Backend HTTP API — health endpoint, image serving, REST endpoints for external callers
LangGraph	`worker`	Slack bot + LangGraph deal-analysis workflows (deep agents, RAG) — the heavy AI tier
Web App	`canvas`	Canvas document editor — FastAPI + ProseMirror, JWT/RBAC, brushes; runs `WEB_CONCURRENCY=4`
Marketing Site	`landing`	Public landing page + waitlist — lightweight uvicorn, Resend email
Evergreen (staging only)	`evergreen`	Nightly P-series audit worker (`python -m src.evergreen_worker.main`); APScheduler fires midnight Pacific
Redis-Checkpointer	(infra)	LangGraph `AsyncRedisSaver`; Canvas pub/sub bridge, edit locks, presence, caches
Postgres	(infra)	Structured data via `PGStoreClient`; also hosts the DuckLake observability catalog

There is also a daily DuckLake Maintenance cron service that runs scripts/ducklake_maintenance.py — flush-inlined / merge / expire / cleanup — to materialize inlined rows to Parquet and bound catalog growth.

Per-service Postgres pool budgets

Railway terminates idle Postgres connections at roughly 60 seconds, and its usable connection ceiling is 97 across all services. Earlier, every service claimed a full ~94-connection budget independently, giving a worst case of 4 × 97 = 388 connections against a 97 ceiling. It only ever worked because min_size=0 (on-demand) plus admission control held the real peak far below capacity.

The current design gives each service a budgeted slice, resolved at startup from RAILWAY_SERVICE_NAME (Railway-provided service identity — immutable per service, so it is identity, not a config knob, and does not violate the no-env-var rule). The budgets are sized so asyncpg pools + DuckLake catalog connections stay under the ceiling:

Service	asyncpg `max_size`	DuckLake catalog conns	Total
`worker` (LangGraph)	38	2 (single process)	40
`canvas` (Web App)	20	8 (4 workers × 2)	28
`web` (API)	15	0	15
`landing` (Marketing Site)	5	0	5
`evergreen` (staging only)	5	0	5
DuckLake Maintenance cron	~0	2 (transient daily)	2
Grand total			95 (≤ 97)

The canvas row is the subtle one. The DuckLake reader is a per-process singleton, and the Web App runs WEB_CONCURRENCY=4 . So Canvas holds 4 workers × 2 catalog connections = 8, not 2 — a single-worker assumption here undershot by 6 and pushed the real worst case to 101. The asyncpg budget was lowered to 20 (20 + 8 = 28 ≤ 30) to absorb it. worker is single-process (slack_bot.py), so its writer and NamespaceCountService share one lake — 2 catalog connections.

A second mechanism keeps per-process pools bounded under scale-up: _calculate_pool_max_size() divides the service budget by max(RAILWAY_REPLICA_COUNT, 1) × max(WEB_CONCURRENCY, 1), with a floor of 4 per process.

The Pinecone connection pool

Retrieval runs against Pinecone serverless through a connection pool sized in PineconeConfig (src/config/pinecone_config.py). The values are production-validated and the frozen dataclass refuses construction below their floors:

pool_size = 96 . Raised through 50 → 72 → 96, each bump a response to pool exhaustion during cascading low-chunk retrievals at 80–93% utilization. Do not lower below 96 without production log evidence; __post_init__ enforces 1 ≤ pool_size ≤ 100.
retrieval_max_parallel = 16 and ingest_max_parallel = 16 . P36 split the formerly-shared max_parallel_index_queries=24 into separate retrieval and ingest semaphores so heavy upserts stop competing with analysis-phase reads on one semaphore. Their combined ceiling of 32 stays well under the admission threshold against a 96-slot pool; __post_init__ enforces both ≥ 16 and retrieval + ingest ≤ pool_size.
connection_timeout_secs = 10.0 (must stay below the 30s query-timeout ceiling) and upsert_timeout_secs = 20.0 (per attempt, 3 retries with exponential backoff).

The primary gate is not the pool size — it is the PoolAwareAdmissionController in src/vector_db/retrieval_executor.py, which checks real-time pool utilization before admitting a query and does graduated load-shedding. The hard-reject threshold is 88% utilization (admission_hard_reject_pct = 0.88 , source-diversity exempt) — lowered from 95% after a production cascade where the pool saturated 95→100% in a single surge; at 88% (84/96 active) the remaining 12 slots cover the source-diversity exemption and the recovery-priority bypass. Above 92% utilization, shielded recovery on cascade-cancelled sections is skipped, since spawning a recovery task during pool exhaustion only deepens starvation. The ordering of these thresholds is invariant-checked at construction.

How the pool feeds the retrieval stages and reranking is covered on the Retrieval Pipeline and Reranking pages.

Sources

src/observability/ducklake.py — ObservabilityLake (attach-once singleton, pg_pool_max_connections=2, heal-on-error, read_only mode, month(ts) partitioning, run_maintenance)
src/observability/writer.py — ObservabilityWriter.append (sync, never-raises) + the single daemon flush thread; get_observability_writer()
src/observability/schemas.py — TABLE_SCHEMAS registry (15 tables; native ADD COLUMN discipline in the module docstring)
src/observability/finalize_hook.py — record_deal_finalization() canonical write path (success + degraded)
src/observability/namespace_count_service.py, src/observability/retrieval_health_service.py — live-path singleton consumers (_safe_query, _MIN_SAMPLE)
src/canvas/services/portfolio_insights_service.py + portfolio_insights_queries.py — N≥3 anonymity guard in Python (_MIN_DEALS_FOR_AGGREGATE), finalization_status='success' filter
src/utils/production_capacity_config.py — per-service Postgres budgets (POSTGRES_PER_SERVICE_MAX_SIZE, POSTGRES_DUCKLAKE_RESERVED, POSTGRES_RAILWAY_USABLE_CEILING, _RAILWAY_SERVICE_NAME_TO_KEY)
src/config/pinecone_config.py — PineconeConfig (pool size, retrieval/ingest semaphores, admission thresholds, __post_init__ invariants)
railway.json — service definitions (Web App, Marketing Site, Evergreen, DuckLake Maintenance, …)
.claude/rules/10-domains.md, .claude/rules/20-patterns.md — Railway service table, the observability-warehouse consumer pattern, the per-service pool budgets
memory/observability_ducklake_migration.md, memory/observability_warehouse_consumers.md, memory/postgres_pool_event_loop_invariants.md — DuckLake migration, the consumer census, and the pool event-loop invariants