Your second brain
for closing deals.
Speak after a showing. Forward an email. Pull up a client. Lumi captures the soft signals, fills the brief, and feeds Claude — automatically.
- Voice → CRM, auto. No forms.
- Works offline. Syncs when you're back.
- Free for agents in EU · LatAm · MENA.
10-min read · Updated April 2026
Lumi · Wednesday
Good morning, Niki.
Two showings · three leads need a nudge.
Showing · Passeig de Gràcia 84
Claude on the call.
He saved the deal at minute 14.
Some objections form 8-12 seconds before they crystallise. The client's tone shifts; their phrasing slows; they hedge before the actual pushback lands. Humans are bad at catching this in real time — too busy listening to the words. AI watching the streaming transcript catches the formation, suggests two phrasings, and lets the agent walk into the objection with an answer ready.
client (just said):
"I just don't think €825 is realistic for
this area right now. We've seen three
similar listings drop 5-7% last month."
claude suggests (peripheral display):
softer:
"Three drops is real — show me the third
and we'll calibrate."
direct:
"Two of those were dated stock. Want
the comp pull?"Two suggestions. Each ≤14 words. The agent picks tone in their next breath, speaks in their voice — the client never hears the AI.
What the co-pilot watches for.
Five objection patterns that have measurable behavioural fingerprints in voice and tone — and that humans miss faster than AI does. Other objections fall to SILENT and the agent handles them as normal.
Five rules. Sub-1.5-second latency.
The discipline of the live co-pilot. Skip any one and the protocol either breaks the call's rhythm, leaks the AI's presence, or becomes legally exposed.
- 01
Two-party-consent disclosure is the only legal foundation.
Same as the call-log protocol — every jurisdiction requires the client to know the call is being recorded. The disclosure is a 4-second pre-roll: 'this call is recorded and I'm using AI to help with notes — let me know if you'd rather not'. Most clients say 'sure, fine'. The 1-2% who decline get the agent's full attention without the AI; their conversion doesn't suffer measurably.
- 02
Streaming Whisper, not file-batch Whisper.
The whole point is real-time, so the transcription must stream. OpenAI's Realtime API or AssemblyAI's streaming endpoint produce 8-12 second chunks with ~1.2-second latency. File-batch Whisper at end-of-call defeats the protocol — you'd be reading suggestions for an objection that's already passed.
- 03
Haiku for latency. Sonnet only for end-of-call summary.
Each chunk → Haiku call with structured output. Total latency budget: 1.5s from chunk arrival to suggestion display. Sonnet adds 0.6-0.9s and the agent doesn't have that budget mid-call. Use Sonnet only for the post-call summary (the call-log protocol). For live, Haiku's worse classification is still better than no classification — the agent fills in the gap.
- 04
Display in peripheral vision. Never in the agent's reading focus.
The suggestion shows on a side-monitor or a small phone-screen overlay — somewhere the agent can glance at in 0.4 seconds and return to the client's eyes. Center-of-screen displays make the agent break eye contact, which the client hears in the silence. Peripheral display is the only display that doesn't leak the AI's presence.
- 05
The agent picks tone. Always. No auto-fire.
Two suggestions per SUGGEST event, tagged softer/direct. The agent reads both in 0.4s and speaks one in their voice — never reading verbatim, always paraphrasing. This is the line that protects the relationship: the AI is a sidekick, not a teleprompter. The moment a client hears a script, the trust collapses.
Three suggestion failures that break the call.
Each of these has been produced by a loose prompt and used by a real agent in a live call. Each one was the moment the client felt the AI's presence — and once felt, it can't be un-felt.
“softer: 'I understand how you feel, Hugo. Other clients have felt the same way, but they found that...'”
Three things wrong: 1) the 'feel-felt-found' formula is recognisable as a sales script and breaks rapport. 2) Multi-sentence — agent has no time mid-call. 3) Doesn't address the actual objection (market drops). Generic scripts fail every test the protocol enforces.
“softer: 'Let me share three data points: comp listing X went under contract for €820 last month, the area's median price-per-sqm is up 4% YoY, and the building has had two recent renovations that justify the premium. Would you like me to walk through any of these?'”
Useful information; impossible to deliver mid-call. The agent can't read this in 0.4s. By the time they finish reading it, the client has spoken twice more. Suggestions must be ≤14 words for the protocol to work at all.
“softer: 'You're right, the price is high — let's see if we can negotiate.'”
Contradicts the agent's actual position (they're representing the seller or have committed to the listing price with the buyer). Worse: gives the client an unintended concession the agent didn't authorise. The protocol must respect the agent's prior calibration, not undo it.
The chunk input.
What the prompt receives every 8-12 seconds during a call. Voice samples are the prompt's anchor — without them the suggestions land in a generic-AI register that the agent can't speak.
# ── live transcript chunk (12.3s) ──────────────
chunk_id: "c_0048"
audio_offset: "00:14:22 → 00:14:34"
speaker: "client (Hugo Almeida)"
transcript: "Look, I just don't think
€825 is realistic for this
area right now. We've seen
three similar listings drop
by 5-7% in the last month."
context_brief:
client: "Hugo · Cascais buyer"
budget_range: "€700-820k"
showing_count: 4
prior_signals: "anchor on data, decision-
group includes wife"
agent_voice_samples:
- "Fair point — but the comp pulls I'd
pull would tell a different story.
Want me to send the deck?"
- "It's tight, agreed. But the fixed
price gives you certainty no comp
can. Move now or wait for the dip?"
- "Three drops in a month is real. Two
of those were on dated stock. Show
me the third and we'll talk."
What to feed Claude.
The system prompt that runs once per chunk. Use Haiku — the latency budget rules out Sonnet. Structured-output mode required for downstream UI.
You are a senior real-estate agent's
live call co-pilot.
INPUT
Streaming Whisper transcript of an active
phone call (chunks of 8-12 seconds), the
agent's existing brief on this client, and
the agent's own prior objection-handling
samples (2-3 examples of how this agent
typically responds to common objections).
OUTPUT
Per chunk, decide: SILENT, ALERT, or SUGGEST.
SILENT — no objection pattern detected.
Return nothing. Most chunks land
here.
ALERT — an objection pattern is forming
but the client hasn't fully voiced
it yet. Return a 4-word tag:
e.g. "price · spouse · timing".
The agent sees the tag in their
peripheral vision and adjusts.
SUGGEST — the objection has crystallised.
Return 2 short rebuttals (each
≤14 words) in the agent's voice,
tagged by tone:
- softer (acknowledge + reframe)
- direct (data + close)
The agent picks tone in the next
breath.
RULES (non-negotiable)
1. Latency budget: under 1.5 seconds from
chunk arrival to suggestion display.
This means Haiku, structured output,
no chain-of-thought tokens in response.
2. Suggestions match the agent's voice
samples — sentence rhythm, vocabulary,
typical sentence length.
3. Never escalate. If the client is
getting heated, suggestions soften.
No tactical pressure tactics.
4. Detect 5 patterns specifically: price,
timing, spouse, market-fear, agent-trust.
Other objections fall to SILENT.
5. The agent always picks. AI suggests;
the agent speaks the words.
ANTI-PATTERNS (never produce these)
- Multi-sentence rebuttals (the agent has
no time to read them mid-call)
- Generic scripts ("I understand how you
feel, others have felt the same way…")
- Suggestions that contradict the agent's
prior phrasings
- Anything that sounds like a sales-script
(the client can hear scripts in tone)
The agent should glance at the suggestion
in 0.4 seconds, choose tone, speak.For testing offline: paste a 10-minute recorded call transcript chunked manually. For live: stream from Realtime API + Haiku.
What Claude returns.
The peripheral-display format. Two suggestions, tagged by tone, ready for the agent to pick in their next breath.
decision: SUGGEST
tags: [price, market-fear]
suggestions:
softer:
"Three drops is real — show me the
third and we'll calibrate. Send the
listings?"
direct:
"Two of those drops were dated stock.
This isn't the same market. Want the
comp pull?"
agent_picks_tone_in_next_breathStreaming the call is step one.
Trusting the peripheral display is step two.
Lumi is the app that runs this workflow for you. You speak after a showing — Lumi captures the soft signals. You forward an email — Lumi updates the constraints. You open the app at 8am — the brief is already there, ready to feed Claude.
- Voice → structured CRM, automatically
- No forms. No data entry. No copy-paste.
- Free for agents in EU · LatAm · MENA
Lumi · Wednesday
Good morning, Niki.
Two showings · three leads need a nudge.
Showing · Passeig de Gràcia 84
Pipeline
Active
8
Warm
4
Cold
2
Clara Ruiz
Active€1.8M · 3BR
Passeig de Gràcia showing · 11:30
Andreas Moreno
Active€2.4M · 4BR
Send comps by 18:00
Dimitri Schneider
Warm€900K · 2BR
Contract review today
Silent 3d · last 3 days ago
Sarah Mitchell
Cold€1.2M · 3BR
Draft re-engagement
Silent 9d · last 9 days ago
A real-estate adaptation of the real-time AI co-pilot pattern from SaaS sales (Gong, Chorus, Outreach Kaia). Our slice: the 5 objection patterns that move RE deals — caught 8-12 seconds before the human notices.
More guides like this on @lumi.estate. Follow if any of this was useful — it's how we know to keep writing.