How it works

Chat assistant architecture

Layered stack and request flows for the portfolio RAG widget, human handoff, and operator ingest.

System layers

Each layer has a single responsibility. Data flows top-down; PubSub and webhooks connect async paths.

Client

Browser widget

ChatWidgetLive · WebSocket · en/es

LiveView

Visitor bot + operator console

Webhook

Edge

TLS / CDN

Cloudflare → misaelp.dev

HTTPS

Nginx

Proxy → 127.0.0.1:4001

Contabo VPS

Phoenix

Router

/:locale/* · /webhook/telegram

Bandit

Controllers & LiveViews

ChatWidgetLive · TelegramController

Phoenix 1.8

Domain

Chat orchestration

Persist messages · mode ai|human

MisaelParedes.Chat

RAG pipeline

Embed · retrieve · rerank · generate

MisaelParedes.AI.RAG

Handoff & Telegram

Keywords · operator /reply · /ingest

Handoff · Operator

Data

PostgreSQL

conversations · messages

Ecto

pgvector

document_chunks · cosine search

768-dim

ETS + PubSub

Rate limits · live widget push

OTP

External

Embeddings API

Query + chunk vectors

Gemini

Chat completion

Grounded answer from context

Gemini / OpenAI

Optional rerank

Top-k chunk refinement

Cohere

Solid arrows in flows below = synchronous request Dashed paths = PubSub / Telegram webhook

Request iterations

Three main loops. Most visitor traffic follows Flow A until a handoff keyword switches to Flow B.

Flow A

RAG answer

Default AI mode · locale-aware

Visitor submits question

ChatWidgetLive → Chat.handle_visitor_message/3

Rate limit check

ETS bucket per session · hourly cap

Embed the question

AI.Provider.embed/1 → 768-d vector

Retrieve similar chunks

Knowledge.search_similar/2 on pgvector

Optional rerank

Cohere rerank when configured · else top-k truncate

Augment system prompt

Locale + portfolio context + guardrails

LLM completion

Provider.chat/2 → first-person answer + sources

Persist & push

Conversations.add_message · PubSub → widget stream

Flow B

Human handoff

Operator via Telegram

Handoff keyword detected

e.g. «quiero hablar contigo» · Handoff.activate/2

Switch to human mode

conversation.mode = human · bridge message to visitor

Notify operator

Telegram: alert + full transcript · chat marked active

Operator replies

Free text or /reply ID · Telegram → store_operator_reply

Realtime delivery

PubSub conversation:ID → ChatWidgetLive stream_insert

Release to AI

/release on Telegram · mode returns to ai

Flow C

Knowledge ingest

Operator updates pgvector

Operator sends /ingest

Locale + source slug · draft or one-shot text

Split into chunks

Paragraphs · ~1200 chars max per chunk

Embed & insert

Indexer.index_chunks/1 → document_chunks

Available on next query

Flow A retrieval picks up new vectors immediately

Stats & rebuild

/kb counts by source · /reindex from Content

Indexed sources

• Built-in portfolio chunks (Content.knowledge_chunks/0)
• Telegram /ingest — live notes per locale & source
• /reindex rebuilds from code; /kb shows chunk counts

Guardrails

• Hourly RAG rate limit per visitor session (ETS)
• Tools rate limit on LiveView events only
• First-person prompts · cite context · no invented facts

View demo repo