Image Intelligence
A real-estate offering memorandum is full of visual information that plain text extraction misses: a sources-and-uses table, a capital-stack diagram, a rent-roll matrix, a site plan, a sensitivity grid. Memosa’s Image Intelligence pipeline renders, classifies, and interprets these figures so the facts inside them flow into your memo instead of being lost. It also handles the property photography — sourcing, enhancing, and placing the imagery a polished memo needs.
Reading the figures in a PDF
Section titled “Reading the figures in a PDF”Memosa does not try to read figures out of a flat text dump. It works from the rendered pages themselves, so it sees the diagram the way a human reviewer does. At a high level the pipeline:
- Renders each PDF page at high resolution.
- Detects table regions using the PDF’s own geometry, which is more reliable than guessing table boundaries from pixels.
- Classifies every element on the page with a vision model — is this a chart, a table, a diagram, a photo, a map?
- Crops each element out of the page, preferring the geometric table regions for tables and the vision model’s boxes for everything else.
- Filters the results so only genuinely useful figures move downstream.
- Maps each surviving figure to the memo section it belongs to.
Diagram subtypes
Section titled “Diagram subtypes”When the classifier identifies a diagram, Memosa goes a step further and detects its specific subtype, because a capital stack, a waterfall, and a stacking plan each need to be read differently. There are 21 recognized diagram subtypes — including sources & uses, capital stack, waterfall, org chart, process flow, stacking plan, rent-roll matrix, lease-expiry schedule, sensitivity table, debt-maturity profile, construction timeline, site plan, and demographic analysis, among others. Each subtype has both its own keyword detection and its own specialized vision prompt, so the model is asked the right questions for the kind of diagram in front of it.
Captions written with the whole memo in hand
Section titled “Captions written with the whole memo in hand”Every analytical figure that reaches the memo gets a reader-facing caption — a sentence explaining what the figure shows in the context of this deal. Memosa writes these captions in a single pass at synthesis time, once all the section content is finalized.
That timing is deliberate. A caption written while a section is still being drafted has less to work with; a caption written at synthesis has the complete, final memo as context and can be specific and comparative. Generating captions once, at the end, also avoids a class of race conditions that per-section captioning would create. The result is captions that name the actual figures and metrics a reader cares about, not generic “Figure 3” labels.
Property imagery: sourcing and enhancement
Section titled “Property imagery: sourcing and enhancement”Beyond the analytical figures, a finished memo needs strong property imagery — a hero banner, clean photographs. Memosa treats these as a non-negotiable part of memo quality, and the pipeline does real work to get them right:
- Web hero sourcing. When the source documents don’t supply a strong enough property image, Memosa fetches property imagery from the web to serve as the hero.
- AI upscaling. Low-resolution source images are upscaled so they hold up at banner size and in print.
- Person detection. Images with people in them are filtered out of analytical and hero roles — a headshot or a stock photo of a handshake is not property imagery. Analytical figures deliberately bypass this filter so a diagram that happens to mention a person isn’t wrongly suppressed.
- Re-classification. Images are re-examined during processing so a figure that was initially mislabeled gets corrected before it lands in the memo.
The hero image
Section titled “The hero image”For the hero banner, Memosa filters candidates to property-type images with no people and adequate resolution — at least 120,000 pixels — then scores the survivors on resolution, visual richness, relevance, and aspect ratio, and picks the best one. The banner itself is cropped to a roughly 2.7:1 landscape ratio , biased slightly upward so building tops aren’t cut off, and held to a minimum width so it stays crisp.
What you see in the memo
Section titled “What you see in the memo”The end result, from your perspective as an analyst, is that the figures from your PDFs show up in the right memo sections with captions that actually describe them, the data inside tables and diagrams informs the analysis, and the memo carries clean, appropriately sized property imagery. Charts the system generates from your Excel model are a separate system — see charts — but the captioned figures lifted from your source PDFs come from this pipeline.
Sources
Section titled “Sources”src/canvas/services/image_digitizer_service.py— the 21-subtype diagram detection (_detect_diagram_subtype,_DIAGRAM_SUBTYPE_KEYWORDS,_DIAGRAM_SUBTYPE_PROMPTS).src/langchain/workflows/tools/image_intelligence/policy/hero_image_selector.py— hero candidate filtering and scoring, the 120,000-pixel minimum resolution, and the 2.7:1 banner ratio.src/langchain/workflows/tools/image_intelligence/classification/banner_cropper.py— the banner crop (2.7:1, upper-biased, minimum width).src/langchain/workflows/tools/image_intelligence/curation/person_detection.py— person detection as an upstream guard, with analytical-image bypass.src/langchain/workflows/tools/image_intelligence/enhancement/web_property_image_fetcher.pyandenhancement/ai_upscaler.py— web hero sourcing and AI upscaling.memory/image_intelligence_v2.md— the V2 page-render pipeline flow, the single-pass synthesis-time caption architecture, the OCR rescue principle, and the property-imagery enhancement set.