Skip to content

Reranking

Retrieval over-fetches on purpose: each section pulls more chunks from Pinecone than it needs (see Retrieval Pipeline). Reranking is the precision gate that re-scores that pool by genuine query relevance and trims it to the chunks the synthesis agent actually cites. Getting this stage right is what separates “a pile of vaguely-related chunks” from “the right evidence for this claim.”

This page describes the two-tier reranker, exactly when each tier runs, the circuit breaker that switches between them, and the score floors that gate the input pool.

The reranker is Voyage XOR SemanticBM25 — not a stack. They are mutually exclusive for any given section.

TierEngineRoleWhen it runs
PrimaryVoyage rerank-2.5 cross-encoderJoint query+document encodingWhenever Voyage is available
FallbackSemanticBM25 70/30 hybridCosine + BM25 lexical blendOnly when Voyage is skipped or fails

Voyage rerank-2.5 (src/reranker/providers/voyage.py, VOYAGE_DEFAULT_MODEL = "rerank-2.5" ) is a cross-encoder: it jointly encodes the query and each document together, which is strictly more powerful than the lexical-plus-cosine scoring BM25 can produce. It is wired as a process-wide singleton (get_production_cross_encoder_reranker() in src/di/core_services.py) so its internal concurrency semaphore is shared across every CRE engine instance and its voyageai.AsyncClient keeps an HTTP/2 connection pool warm. The singleton returns None — disabling Voyage — when VOYAGE_API_KEY is unset or cross_encoder_reranking_enabled is False.

The fallback is not decoration — a Voyage outage cannot be allowed to stall every retrieval. When the cross-encoder ran cleanly, the chunk ordering is already higher fidelity than BM25 can produce, so the BM25 stage skips itself (it checks state.cross_encoder_succeeded and returns immediately). SemanticBM25 runs only when Voyage is genuinely unavailable for that section.

Voyage is skipped — falling through to SemanticBM25 — in precisely three cases, all handled in assembly_phase._apply_cross_encoder_reranking:

  1. Circuit breaker is OPEN. Voyage’s API has been failing transiently and the breaker has tripped (details below). Emits cross_encoder_skipped with reason="circuit_breaker".
  2. Budget is too tight. Voyage rerank-2.5 latency grows ~linearly with chunk count (~25ms/chunk in production). The phase computes an adaptive floor — max(absolute_floor, chunk_count × per_chunk_floor_ms) — and if remaining pipeline budget is below it, Voyage is skipped rather than admitted with insufficient budget to finish (which would waste both Voyage tokens and the SDK retry). Emits reason="budget".
  3. A live API failure. Voyage raises (timeout, 503, network). The exception is caught, the breaker records a failure, and the section falls through to BM25. Emits reason="api_error".

In all three cases the pipeline is never blocked — reranking is non-fatal. On any reranker failure the pipeline proceeds with the existing chunk ordering.

For pools larger than 80 chunks, Voyage runs in parallel batches of 60 (each batch sees a disjoint slice, so no post-merge dedup is needed), then the batch results are merged, re-sorted by rerank score, and truncated to the section’s keep-count. If every batch fails, the error re-raises so the circuit breaker records the failure and the BM25 fallback engages.

Voyage’s transient failures are absorbed by a lightweight in-process circuit breaker (_VoyageRerankCircuitBreaker in assembly_phase.py) with three states:

CLOSED ──3 consecutive failures──▶ OPEN ──60s cooldown──▶ HALF_OPEN
▲ │
└──────────2 consecutive successes──────────────────────────┘
(a single failure re-opens)
  • CLOSED — normal operation. Failures accumulate; 3 consecutive failures (_FAILURE_THRESHOLD = 3 ) trip it OPEN.
  • OPEN — cooldown active for 60 seconds (_COOLDOWN_SECS = 60.0 ). Every rerank call fails-fast straight to BM25 — Memosa does not pay the Voyage failure latency on each retrieval.
  • HALF_OPEN — after the cooldown elapses, trial calls are allowed. A single failure re-opens immediately; 2 consecutive successes (_HALF_OPEN_SUCCESS_THRESHOLD = 2 ) are required to fully close. Without this two-success guard, a flaky upstream thrashes between OPEN and CLOSED on every cooldown roll — one success closes, the next failure re-opens, and the BM25 fallback flips on and off in production.

The breaker is per-process: a brief Voyage outage on one worker degrades only that worker’s retrievals to BM25 for ~60s while other workers operate normally.

Before reranking, each chunk must clear a minimum relevance score. These section score floors are tuned per section in src/config/retrieval_parameter_defaults.py and range from 0.10 to 0.18:

SectionFloorWhy
sponsor_background0.10A sponsor’s company/team rarely self-identifies as “sponsor” — those chunks score lower on embedding queries
exit_strategy0.10Needs broad evidence (cap rates, hold scenarios, disposition); lets the Voyage cross-encoder make the precision call
financial_analysis0.12CoStar submarket chunks score low on financial queries; the reranker provides the precision gate
risk_market0.14
comparables_analysis0.15
system default0.18The conservative baseline for unlisted and PDF-primary sections

The floors are section-tuned, not pool-size-derived. Earlier documentation claimed the floor “scales with pool size” — that was always aspirational; the real implementation is the per-section table plus the density-adaptive layer below.

The static section floor is relax-only in spirit: low-quality (grade C/D/F) retrievals can relax their floor and top-k to recover recall. But a dense, high-quality corpus has the opposite problem — the static floor lets the entire raw pool through to be reranked, and Voyage pays per-chunk for chunks that will obviously be demoted. The density-adaptive pre-rerank floor (Hypothesis E, May 2026) adds the symmetric tighten path.

The mechanism (compute_adaptive_hints in quality_scorer.py) detects a natural cliff in the cosine-score distribution above the must-have core, and tightens the floor to drop the cliff tail before reranking. The quality benefit is not the floor change itself — it is feeding Voyage a denser input pool.

Tightening fires only when all of the following hold:

  • The retrieval graded A or B (high quality).
  • The raw pre-truncation cosine scores, the static floor, and the rerank keep-count (rerank_top_n) are all available.
  • The Voyage circuit breaker is CLOSED — when Voyage is degraded, BM25 is the primary signal and wants more candidates, not fewer.
  • There is room above the must-have core (len(scores) > rerank_top_n).
  • A cliff exists at some position past the core: the next score drops below 85% of the current one (_CLIFF_RATIO = 0.85 ) and the absolute gap is at least 0.05 (_CLIFF_MIN_ABSOLUTE_DROP = 0.05 ). The absolute-gap guard stops a 15% relative drop on already-low scores (e.g. 0.20 → 0.16) from being mistaken for a cliff.
  • The cliff bottom improves on the static floor — tightening never produces a floor weaker than the P11-validated section floor.

The first cliff above the core wins (single-cliff semantics — no iterating to find a “best” one). Two apply-time invariants protect the result:

  • Pool-size invariant — the must-have core is always preserved (len(tightened) ≥ rerank_top_n).
  • Source-diversity invariant — if tightening would drop the last chunk of any present source type (CoStar tends to cluster in the score tail on non-CoStar queries), the tighten is skipped with tighten_skip_reason="source_diversity_loss".

The Voyage circuit-breaker state is re-checked at apply time, not just at hint-compute time — between those points another worker’s failure could flip the breaker, and the apply-time re-check skips the tighten if so. Tighten decisions are not cached (cliff position is per-query). Every vector_search_tool_result event carries tighten_applied, tighten_effective_floor, tighten_cliff_position, tighten_dropped_count, and tighten_skip_reason for tuning; the target apply-rate band is 2–40% per section.

The reranked pool is not the final ordering. A handful of deterministic steps run after the reranker in the assembly phase:

  • Recovered-chunk protection re-admits up to 8 Stage B chunks that the reranker cut below the keep-count (so a recovered source type cannot silently vanish), and Excel-required sections enforce a per-section Excel survivor floor.
  • Source-precedence tiebreaker applies tiny additive boosts — bounded below the smallest reasonable reranker margin — in canonical precedence order (USER > EXCEL > COSTAR > PDF). A 0.85 Excel chunk does not outrank a 0.95 PDF chunk on real relevance, but a 0.851 Excel chunk does beat a 0.850 PDF one. The boosts are provider-aware: BM25-tier values are scaled to half the Voyage-tier values because the two rerankers produce different score magnitudes (Voyage ~0.6–0.95, calibrated BM25 ~0.2–0.3).
  • Document fairness and source-type diversity rebalance the pool so a single dominant document or source type cannot monopolize every slot.

The provider that ran is stamped on each chunk (rerank_provider) so these downstream steps — and the citation injector — know which score regime they are in. For how the final pool becomes a cited memo, see Synthesis and Footnotes.

  • src/reranker/providers/voyage.py — Voyage rerank-2.5 cross-encoder (VOYAGE_DEFAULT_MODEL, max_concurrency)
  • src/reranker/semantic_bm25_reranker.py — SemanticBM25 fallback, 70/30 internal blend (semantic_weight / bm25_weight, normalized by total)
  • src/di/core_services.pyget_production_cross_encoder_reranker() singleton, VOYAGE_API_KEY / cross_encoder_reranking_enabled gating
  • src/utils/consolidated_retrieval/pipeline/phases/assembly_phase.py_VoyageRerankCircuitBreaker (3-state, thresholds), Voyage-vs-BM25 selection, batching, skip reasons, precedence tiebreaker
  • src/utils/consolidated_retrieval/quality_scorer.pycompute_adaptive_hints, cliff-detection constants (_CLIFF_RATIO, _CLIFF_MIN_ABSOLUTE_DROP)
  • src/config/retrieval_parameter_defaults.py — per-section score floors (0.10–0.18)
  • memory/reranker_voyage_primary.md — Voyage-primary architecture, the 70/30-is-internal distinction
  • memory/density_adaptive_floor.md — Hypothesis E cliff-tighten design, invariants, telemetry