Reranking
Retrieval over-fetches on purpose: each section pulls more chunks from Pinecone than it needs (see Retrieval Pipeline). Reranking is the precision gate that re-scores that pool by genuine query relevance and trims it to the chunks the synthesis agent actually cites. Getting this stage right is what separates “a pile of vaguely-related chunks” from “the right evidence for this claim.”
This page describes the two-tier reranker, exactly when each tier runs, the circuit breaker that switches between them, and the score floors that gate the input pool.
Two tiers: Voyage primary, BM25 fallback
Section titled “Two tiers: Voyage primary, BM25 fallback”The reranker is Voyage XOR SemanticBM25 — not a stack. They are mutually exclusive for any given section.
| Tier | Engine | Role | When it runs |
|---|---|---|---|
| Primary | Voyage rerank-2.5 cross-encoder | Joint query+document encoding | Whenever Voyage is available |
| Fallback | SemanticBM25 70/30 hybrid | Cosine + BM25 lexical blend | Only when Voyage is skipped or fails |
Voyage rerank-2.5 (src/reranker/providers/voyage.py, VOYAGE_DEFAULT_MODEL = "rerank-2.5" ) is a cross-encoder: it jointly encodes the query and each document together, which is strictly more powerful than the lexical-plus-cosine scoring BM25 can produce. It is wired as a process-wide singleton (get_production_cross_encoder_reranker() in src/di/core_services.py) so its internal concurrency semaphore is shared across every CRE engine instance and its voyageai.AsyncClient keeps an HTTP/2 connection pool warm. The singleton returns None — disabling Voyage — when VOYAGE_API_KEY is unset or cross_encoder_reranking_enabled is False.
Why the fallback exists at all
Section titled “Why the fallback exists at all”The fallback is not decoration — a Voyage outage cannot be allowed to stall every retrieval. When the cross-encoder ran cleanly, the chunk ordering is already higher fidelity than BM25 can produce, so the BM25 stage skips itself (it checks state.cross_encoder_succeeded and returns immediately). SemanticBM25 runs only when Voyage is genuinely unavailable for that section.
Exactly when the fallback engages
Section titled “Exactly when the fallback engages”Voyage is skipped — falling through to SemanticBM25 — in precisely three cases, all handled in assembly_phase._apply_cross_encoder_reranking:
- Circuit breaker is OPEN. Voyage’s API has been failing transiently and the breaker has tripped (details below). Emits
cross_encoder_skippedwithreason="circuit_breaker". - Budget is too tight. Voyage rerank-2.5 latency grows ~linearly with chunk count (~25ms/chunk in production). The phase computes an adaptive floor —
max(absolute_floor, chunk_count × per_chunk_floor_ms)— and if remaining pipeline budget is below it, Voyage is skipped rather than admitted with insufficient budget to finish (which would waste both Voyage tokens and the SDK retry). Emitsreason="budget". - A live API failure. Voyage raises (timeout, 503, network). The exception is caught, the breaker records a failure, and the section falls through to BM25. Emits
reason="api_error".
In all three cases the pipeline is never blocked — reranking is non-fatal. On any reranker failure the pipeline proceeds with the existing chunk ordering.
For pools larger than 80 chunks, Voyage runs in parallel batches of 60 (each batch sees a disjoint slice, so no post-merge dedup is needed), then the batch results are merged, re-sorted by rerank score, and truncated to the section’s keep-count. If every batch fails, the error re-raises so the circuit breaker records the failure and the BM25 fallback engages.
The 3-state circuit breaker
Section titled “The 3-state circuit breaker”Voyage’s transient failures are absorbed by a lightweight in-process circuit breaker (_VoyageRerankCircuitBreaker in assembly_phase.py) with three states:
CLOSED ──3 consecutive failures──▶ OPEN ──60s cooldown──▶ HALF_OPEN ▲ │ └──────────2 consecutive successes──────────────────────────┘ (a single failure re-opens)- CLOSED — normal operation. Failures accumulate; 3 consecutive failures (
_FAILURE_THRESHOLD = 3) trip it OPEN. - OPEN — cooldown active for 60 seconds (
_COOLDOWN_SECS = 60.0). Every rerank call fails-fast straight to BM25 — Memosa does not pay the Voyage failure latency on each retrieval. - HALF_OPEN — after the cooldown elapses, trial calls are allowed. A single failure re-opens immediately; 2 consecutive successes (
_HALF_OPEN_SUCCESS_THRESHOLD = 2) are required to fully close. Without this two-success guard, a flaky upstream thrashes between OPEN and CLOSED on every cooldown roll — one success closes, the next failure re-opens, and the BM25 fallback flips on and off in production.
The breaker is per-process: a brief Voyage outage on one worker degrades only that worker’s retrievals to BM25 for ~60s while other workers operate normally.
Section score floors
Section titled “Section score floors”Before reranking, each chunk must clear a minimum relevance score. These section score floors are tuned per section in src/config/retrieval_parameter_defaults.py and range from 0.10 to 0.18:
| Section | Floor | Why |
|---|---|---|
sponsor_background | 0.10 | A sponsor’s company/team rarely self-identifies as “sponsor” — those chunks score lower on embedding queries |
exit_strategy | 0.10 | Needs broad evidence (cap rates, hold scenarios, disposition); lets the Voyage cross-encoder make the precision call |
financial_analysis | 0.12 | CoStar submarket chunks score low on financial queries; the reranker provides the precision gate |
risk_market | 0.14 | — |
comparables_analysis | 0.15 | — |
| system default | 0.18 | The conservative baseline for unlisted and PDF-primary sections |
The floors are section-tuned, not pool-size-derived. Earlier documentation claimed the floor “scales with pool size” — that was always aspirational; the real implementation is the per-section table plus the density-adaptive layer below.
The density-adaptive pre-rerank floor
Section titled “The density-adaptive pre-rerank floor”The static section floor is relax-only in spirit: low-quality (grade C/D/F) retrievals can relax their floor and top-k to recover recall. But a dense, high-quality corpus has the opposite problem — the static floor lets the entire raw pool through to be reranked, and Voyage pays per-chunk for chunks that will obviously be demoted. The density-adaptive pre-rerank floor (Hypothesis E, May 2026) adds the symmetric tighten path.
The mechanism (compute_adaptive_hints in quality_scorer.py) detects a natural cliff in the cosine-score distribution above the must-have core, and tightens the floor to drop the cliff tail before reranking. The quality benefit is not the floor change itself — it is feeding Voyage a denser input pool.
Tightening fires only when all of the following hold:
- The retrieval graded A or B (high quality).
- The raw pre-truncation cosine scores, the static floor, and the rerank keep-count (
rerank_top_n) are all available. - The Voyage circuit breaker is CLOSED — when Voyage is degraded, BM25 is the primary signal and wants more candidates, not fewer.
- There is room above the must-have core (
len(scores) > rerank_top_n). - A cliff exists at some position past the core: the next score drops below 85% of the current one (
_CLIFF_RATIO = 0.85) and the absolute gap is at least 0.05 (_CLIFF_MIN_ABSOLUTE_DROP = 0.05). The absolute-gap guard stops a 15% relative drop on already-low scores (e.g. 0.20 → 0.16) from being mistaken for a cliff. - The cliff bottom improves on the static floor — tightening never produces a floor weaker than the P11-validated section floor.
The first cliff above the core wins (single-cliff semantics — no iterating to find a “best” one). Two apply-time invariants protect the result:
- Pool-size invariant — the must-have core is always preserved (
len(tightened) ≥ rerank_top_n). - Source-diversity invariant — if tightening would drop the last chunk of any present source type (CoStar tends to cluster in the score tail on non-CoStar queries), the tighten is skipped with
tighten_skip_reason="source_diversity_loss".
The Voyage circuit-breaker state is re-checked at apply time, not just at hint-compute time — between those points another worker’s failure could flip the breaker, and the apply-time re-check skips the tighten if so. Tighten decisions are not cached (cliff position is per-query). Every vector_search_tool_result event carries tighten_applied, tighten_effective_floor, tighten_cliff_position, tighten_dropped_count, and tighten_skip_reason for tuning; the target apply-rate band is 2–40% per section.
After reranking
Section titled “After reranking”The reranked pool is not the final ordering. A handful of deterministic steps run after the reranker in the assembly phase:
- Recovered-chunk protection re-admits up to 8 Stage B chunks that the reranker cut below the keep-count (so a recovered source type cannot silently vanish), and Excel-required sections enforce a per-section Excel survivor floor.
- Source-precedence tiebreaker applies tiny additive boosts — bounded below the smallest reasonable reranker margin — in canonical precedence order (USER > EXCEL > COSTAR > PDF). A 0.85 Excel chunk does not outrank a 0.95 PDF chunk on real relevance, but a 0.851 Excel chunk does beat a 0.850 PDF one. The boosts are provider-aware: BM25-tier values are scaled to half the Voyage-tier values because the two rerankers produce different score magnitudes (Voyage ~0.6–0.95, calibrated BM25 ~0.2–0.3).
- Document fairness and source-type diversity rebalance the pool so a single dominant document or source type cannot monopolize every slot.
The provider that ran is stamped on each chunk (rerank_provider) so these downstream steps — and the citation injector — know which score regime they are in. For how the final pool becomes a cited memo, see Synthesis and Footnotes.
Sources
Section titled “Sources”src/reranker/providers/voyage.py— Voyage rerank-2.5 cross-encoder (VOYAGE_DEFAULT_MODEL,max_concurrency)src/reranker/semantic_bm25_reranker.py— SemanticBM25 fallback, 70/30 internal blend (semantic_weight/bm25_weight, normalized by total)src/di/core_services.py—get_production_cross_encoder_reranker()singleton,VOYAGE_API_KEY/cross_encoder_reranking_enabledgatingsrc/utils/consolidated_retrieval/pipeline/phases/assembly_phase.py—_VoyageRerankCircuitBreaker(3-state, thresholds), Voyage-vs-BM25 selection, batching, skip reasons, precedence tiebreakersrc/utils/consolidated_retrieval/quality_scorer.py—compute_adaptive_hints, cliff-detection constants (_CLIFF_RATIO,_CLIFF_MIN_ABSOLUTE_DROP)src/config/retrieval_parameter_defaults.py— per-section score floors (0.10–0.18)memory/reranker_voyage_primary.md— Voyage-primary architecture, the 70/30-is-internal distinctionmemory/density_adaptive_floor.md— Hypothesis E cliff-tighten design, invariants, telemetry