Chat assistant architecture
Layered stack and request flows for the portfolio RAG widget, human handoff, and operator ingest.
System layers
Each layer has a single responsibility. Data flows top-down; PubSub and webhooks connect async paths.
Browser widget
ChatWidgetLive · WebSocket · en/es
LiveViewTelegram
Visitor bot + operator console
WebhookTLS / CDN
Cloudflare → misaelp.dev
HTTPSNginx
Proxy → 127.0.0.1:4001
Contabo VPSRouter
/:locale/* · /webhook/telegram
BanditControllers & LiveViews
ChatWidgetLive · TelegramController
Phoenix 1.8Chat orchestration
Persist messages · mode ai|human
MisaelParedes.ChatRAG pipeline
Embed · retrieve · rerank · generate
MisaelParedes.AI.RAGHandoff & Telegram
Keywords · operator /reply · /ingest
Handoff · OperatorPostgreSQL
conversations · messages
Ectopgvector
document_chunks · cosine search
768-dimETS + PubSub
Rate limits · live widget push
OTPEmbeddings API
Query + chunk vectors
GeminiChat completion
Grounded answer from context
Gemini / OpenAIOptional rerank
Top-k chunk refinement
CohereRequest iterations
Three main loops. Most visitor traffic follows Flow A until a handoff keyword switches to Flow B.
RAG answer
Default AI mode · locale-aware
Visitor submits question
ChatWidgetLive → Chat.handle_visitor_message/3
Rate limit check
ETS bucket per session · hourly cap
Embed the question
AI.Provider.embed/1 → 768-d vector
Retrieve similar chunks
Knowledge.search_similar/2 on pgvector
Optional rerank
Cohere rerank when configured · else top-k truncate
Augment system prompt
Locale + portfolio context + guardrails
LLM completion
Provider.chat/2 → first-person answer + sources
Persist & push
Conversations.add_message · PubSub → widget stream
Human handoff
Operator via Telegram
Handoff keyword detected
e.g. «quiero hablar contigo» · Handoff.activate/2
Switch to human mode
conversation.mode = human · bridge message to visitor
Notify operator
Telegram: alert + full transcript · chat marked active
Operator replies
Free text or /reply ID · Telegram → store_operator_reply
Realtime delivery
PubSub conversation:ID → ChatWidgetLive stream_insert
Release to AI
/release on Telegram · mode returns to ai
Knowledge ingest
Operator updates pgvector
Operator sends /ingest
Locale + source slug · draft or one-shot text
Split into chunks
Paragraphs · ~1200 chars max per chunk
Embed & insert
Indexer.index_chunks/1 → document_chunks
Available on next query
Flow A retrieval picks up new vectors immediately
Stats & rebuild
/kb counts by source · /reindex from Content
Indexed sources
- • Built-in portfolio chunks (Content.knowledge_chunks/0)
- • Telegram /ingest — live notes per locale & source
- • /reindex rebuilds from code; /kb shows chunk counts
Guardrails
- • Hourly RAG rate limit per visitor session (ETS)
- • Tools rate limit on LiveView events only
- • First-person prompts · cite context · no invented facts