Chat assistant architecture

Layered stack and request flows for the portfolio RAG widget, human handoff, and operator ingest.

System layers

Each layer has a single responsibility. Data flows top-down; PubSub and webhooks connect async paths.

Client

Browser widget

ChatWidgetLive · WebSocket · en/es

LiveView

Telegram

Visitor bot + operator console

Webhook
Edge

TLS / CDN

Cloudflare → misaelp.dev

HTTPS

Nginx

Proxy → 127.0.0.1:4001

Contabo VPS
Phoenix

Router

/:locale/* · /webhook/telegram

Bandit

Controllers & LiveViews

ChatWidgetLive · TelegramController

Phoenix 1.8
Domain

Chat orchestration

Persist messages · mode ai|human

MisaelParedes.Chat

RAG pipeline

Embed · retrieve · rerank · generate

MisaelParedes.AI.RAG

Handoff & Telegram

Keywords · operator /reply · /ingest

Handoff · Operator
Data

PostgreSQL

conversations · messages

Ecto

pgvector

document_chunks · cosine search

768-dim

ETS + PubSub

Rate limits · live widget push

OTP
External

Embeddings API

Query + chunk vectors

Gemini

Chat completion

Grounded answer from context

Gemini / OpenAI

Optional rerank

Top-k chunk refinement

Cohere
Solid arrows in flows below = synchronous request Dashed paths = PubSub / Telegram webhook

Request iterations

Three main loops. Most visitor traffic follows Flow A until a handoff keyword switches to Flow B.

Flow A

RAG answer

Default AI mode · locale-aware

1

Visitor submits question

ChatWidgetLive → Chat.handle_visitor_message/3

2

Rate limit check

ETS bucket per session · hourly cap

3

Embed the question

AI.Provider.embed/1 → 768-d vector

4

Retrieve similar chunks

Knowledge.search_similar/2 on pgvector

5

Optional rerank

Cohere rerank when configured · else top-k truncate

6

Augment system prompt

Locale + portfolio context + guardrails

7

LLM completion

Provider.chat/2 → first-person answer + sources

8

Persist & push

Conversations.add_message · PubSub → widget stream

Flow B

Human handoff

Operator via Telegram

1

Handoff keyword detected

e.g. «quiero hablar contigo» · Handoff.activate/2

2

Switch to human mode

conversation.mode = human · bridge message to visitor

3

Notify operator

Telegram: alert + full transcript · chat marked active

4

Operator replies

Free text or /reply ID · Telegram → store_operator_reply

5

Realtime delivery

PubSub conversation:ID → ChatWidgetLive stream_insert

6

Release to AI

/release on Telegram · mode returns to ai

Flow C

Knowledge ingest

Operator updates pgvector

1

Operator sends /ingest

Locale + source slug · draft or one-shot text

2

Split into chunks

Paragraphs · ~1200 chars max per chunk

3

Embed & insert

Indexer.index_chunks/1 → document_chunks

4

Available on next query

Flow A retrieval picks up new vectors immediately

5

Stats & rebuild

/kb counts by source · /reindex from Content

Indexed sources

  • Built-in portfolio chunks (Content.knowledge_chunks/0)
  • Telegram /ingest — live notes per locale & source
  • /reindex rebuilds from code; /kb shows chunk counts

Guardrails

  • Hourly RAG rate limit per visitor session (ETS)
  • Tools rate limit on LiveView events only
  • First-person prompts · cite context · no invented facts
View demo repo

Inquire about my experience

Portfolio assistant — architecture, projects, and consulting.

See how RAG retrieval works