This is the absolute essence. If you only had one minute to read this whole guide, this is what you would walk away with. Every other tab is layered detail under these five sections.
0.1 What EAII is
Human Discovery Inc is building EAII (Emotional Artificial Intelligence Infrastructure), a platform that gives AI systems a structured understanding of user emotional state. Today's LLMs respond to messages but have no representation of what the user is feeling, what they need, or whether they are engaged. EAII produces a structured emotional state representation (emotion, valence, intensity, tension, intent, response strategy, conversation dynamics, plus operational signals like escalation and disengagement) that downstream AI applications can consume through a standardized API.
0.2 What Pivot is building
Pivot-al AI is building Phase 1 of EAII: the platform around the engine, NOT the real engine itself. Pivot delivers an API service, a data pipeline, a placeholder Engine v0 (a thin LLM wrapper using Azure OpenAI), an investor-facing demo app, internal admin tools (debugger, dashboard, labeling, simulator, research playground), a deterministic safety pipeline for crisis-pattern inputs, cloud infrastructure on AWS, an evaluation framework, and a documentation handover package.
0.3 Why Phase 1 is intentionally not the real engine
Phase 2 is where the real emotional intelligence engine gets built (your domain). Phase 1 deliberately uses a placeholder engine (Engine v0, a thin LLM wrapper) so the architecture, schema, API, and infrastructure can be locked while Phase 2 designs the actual model. The whole system is built around a stable "engine interface boundary" so that swapping the placeholder for the real engine in Phase 2 does not require changing the API, the event store, the demo app, or the internal tools.
0.4 The single most important concept
The architecture has a swap point. Two function signatures:
engine.analyze(canonical_input) -> analysis_output engine.respond(canonical_input, analysis_output?) -> response_output
Engine v0 (Phase 1, thin LLM wrapper) implements these. Engine v2.0 (Phase 2, real ML model) will implement the same signatures. Everything else in the system calls these functions and never knows the difference between v0 and v2.0. That is what makes the whole architecture future-proof.
0.5 What success looks like
When Phase 1 ends, Human Discovery has a running platform on AWS with public API endpoints, an investor demo where Patrick can show structured emotional analysis live to investors, internal admin tools for debugging and iterating, a synthetic evaluation corpus showing the structured output beats simple sentiment, and a complete handover package for taking the system over.
When you are ready for more shape and proportion, go to Tab 2 (100,000 ft). When you are ready for the full architecture and 12 deliverable groups, go to Tab 3 (Macro).
The 200k view is the elevator pitch. This is the one-page brief. Adds shape and proportions without diving into specifics.
0.6 EAII in context
The bet behind EAII: modern LLMs are stateless about emotion. They can sound empathetic in any single response but have no representation of where the user is in an emotional trajectory across turns. EAII fills that gap as an infrastructure layer: an API any AI system can call to get a structured emotional state on each user message. Anthropic's April 2026 interpretability research is supporting evidence that emotion concepts are a meaningful internal representation in modern models; EAII externalizes that representation into a standardized format any downstream AI can consume programmatically.
0.7 The two-phase build
Phase 1 (Pivot, 5 months)
- The PLATFORM around the engine
- Disposable Engine v0
- LLM wrapper on Azure OpenAI
- Investor-ready alpha
Phase 2 (you, 12+ months)
- The ENGINE itself
- Real ML engine (Engine v2.0)
- Fine-tuned models, structured state
- Production system
Phase 1 ends. Phase 2 begins. The architecture is built so the transition is just swapping the engine file behind a stable interface, not redesigning anything around it.
0.8 The 12 deliverable groups (just the list)
- Investor Demo App (public-facing React app)
- Internal Tools (admin web app: 5 sub-tools)
- Modular Backend / API Service (FastAPI)
- Event Schema and Data Pipeline (canonical envelope, 7 event types)
- Engine v0 (placeholder thin LLM wrapper)
- Safety Framework (deterministic crisis-keyword override)
- Infrastructure (AWS, Docker, observability, retention)
- Developer Interface (OpenAPI spec, Python shadow-mode reference)
- Documentation and Handover (runbooks, ADRs, replacement guide)
- UI Design Pattern Selection (3 design tiles)
- Prototype Evaluation Framework (synthetic corpus, baselines, summary)
- External-LLM-Output Scoring (additive workstream, added Apr 28)
0.9 The architecture in 3 boxes
FRONTEND
Demo App + Admin Tools
(React)
BACKEND
FastAPI + Engine v0
(Python)
STORAGE
Events DB + Cache + Async Queue
The frontend is two React apps (public demo, internal tools). The backend is a containerized FastAPI service that does input validation, runs a safety check, calls the engine, and writes events. The storage layer holds events, cache, and async queue. All of it runs on AWS.
0.10 The timeline at a glance
| Phase | Weeks | What |
|---|---|---|
| Architecture lock | Week 1 | API payloads, event schema, UI wireframes |
| Foundation | Weeks 1 to 6 | Backend, event store, Engine v0, safety, demo alpha |
| Internal tools | Weeks 4 to 9 | Debugger, dashboard, labeling, simulator, playground |
| Hardening | Weeks 10 to 12 | QA, staging, production verification |
| Handover | Weeks 9 to 14 | Documentation, ADRs, engine replacement guide |
About 2 months in, the team is roughly between foundation and internal tools.
0.11 The technology stack at a glance
Python (FastAPI) for the backend. React for both frontends. PostgreSQL for the event store (originally MongoDB, swapping for technical reasons). Valkey for cache (originally Redis, drop-in replacement). RabbitMQ for async queue. Azure OpenAI as the LLM provider (gpt-4.1-mini in East US). AWS for cloud hosting (EC2, ECS, S3, CloudWatch). Docker plus Docker Compose for local dev. GitHub Actions for CI/CD. OpenTofu for infrastructure-as-code (originally Terraform, drop-in replacement).
0.12 Why it matters strategically
The team gets two things from Phase 1:
- A working investor-facing demo that lets Patrick show structured emotional understanding live to investors.
- A locked architecture (API, schema, event pipeline, internal tools) that Phase 2 plugs into without redesign.
The "Phase 1 is disposable" framing is intentional: the placeholder engine gets thrown away. The platform around it does NOT.
0.13 What you need to walk into Monday with
Confident command of: what each of the 12 deliverables is, the request flow through the system end to end, the engine boundary and how it makes the Phase 2 swap clean, the taxonomy shape (~30 fields in 10 layers), the 7 event types and how they link, the safety pipeline running before the engine, the technology stack and why three components are being changed, and the technical questions David might raise with answers ready. Those are all built out in Tabs 3, 4, and 5.
1.1 What Pivot is building, in one paragraph
Pivot is building the scaffolding for the EAII platform: an API service, a data pipeline, a placeholder "Engine v0" (a thin LLM wrapper), an investor-facing demo app, internal tools (debugger, dashboard, labeling, simulator, research playground), a deterministic safety pipeline, and the cloud infrastructure to run it all. The output of the system is a structured emotional state representation that downstream AI applications can consume through a standardized API. Phase 1 deliberately stops short of real emotional intelligence so Phase 2 (the actual engine) can plug in cleanly later.
1.2 Phase 1 vs Phase 2 (the most important framing)
Phase 1 (what Pivot is building)
- The platform around the engine
- "Plumbing"
- Engine v0: thin LLM wrapper, disposable
- 14 weeks core build, 5 months total
- Investor-ready alpha
Phase 2 (the future, your domain)
- The engine itself
- "Brain"
- Engine v2.0: real ML model
- 12+ months
- Production system
Phase 1's whole purpose is to build the infrastructure so Phase 2 can plug in cleanly. The SOW says this in §3.5.1, §3.7.7, and §9.1: Engine v0 is "explicitly replaceable and not a long-term EAII intelligence system." The real intelligence comes later.
The technical complexity in Phase 1 is NOT in Engine v0. The complexity is in the surrounding infrastructure: the API, the event pipeline, the schema, the internal tools, the deployment story. Engine v0 is on purpose simple, just disciplined glue around an LLM call.
1.3 The 12 deliverable groups
Investor Demo App
Public-facing chat-with-emotional-visualization React app (§3.1)
Internal Tools
Admin web app: Debugger, Dashboard, Labeling, Simulator, Research Playground (§3.2)
Modular Backend / API
Containerized FastAPI service with /v1 endpoints (§3.3)
Event Schema and Data Pipeline
Canonical envelope plus 7 event types (§3.4)
Engine v0
The placeholder thin LLM wrapper (§3.5)
Safety Framework
Deterministic non-ML crisis-keyword override pipeline (§3.6)
Infrastructure
Environments, deploy, observability, retention (§3.7)
Developer Interface
OpenAPI spec plus Python shadow-mode reference (§3.8)
Documentation and Handover
Runbooks, ADRs, replacement guide (§3.9)
UI Design Pattern Selection
3 design tiles, you pick one (§3.10)
Prototype Evaluation Framework
Synthetic corpus, baselines, investor-readable summary (§3.11)
External-LLM-Output Scoring
Additive workstream evaluating assistant/model outputs (§3.12, added Apr 28)
1.4 The timeline (9 milestones, 14-week core build)
Architecture lock
DevOps baseline
Backend foundation
Engine v0
Respond plus safety
Demo App alpha
Internal tools
QA plus hardening
Docs plus handover
~2 months in, we are most likely between M2 and M5 (orange-highlighted). M6 (Demo App alpha) and M7 (Internal Tools) are the biggest remaining workstreams.
1.5 The architecture in 30,000 ft view
USER ACTION (typing in the demo app)
|
v
[Frontend: React] sends JSON request with API key
|
v
[API Gateway: Kong] checks API key, applies rate limit
|
v
[Backend: FastAPI Service]
1. Validates and normalizes input
2. Runs Safety Pipeline check (BEFORE any LLM call)
3. Calls engine.analyze() at the engine interface boundary
4. Engine v0 calls Azure OpenAI, validates the JSON output
5. Optionally calls engine.respond() to generate a response
6. Writes Message Event, Analysis Event, Response Event
|
v
[Event Store: MongoDB or Postgres] persists every event
|
v
[Internal Tools] reads from event store
The engine sits behind a stable interface (the swap point). Engine v0 today is a thin LLM wrapper. Engine v2.0 in Phase 2 will be a fine-tuned model. The interface signature stays identical, so swapping engines does not require changing the API, the event store, the demo app, or the internal tools.
1.6 What the system actually DOES (one user request, walked through)
A user types: "I'm so frustrated with this, nothing is working and I don't know what to do."
- Frontend sends POST to
/v1/emotions/analyzewith the message text and API key. - Kong gateway validates the API key, applies rate limiting.
- FastAPI backend receives the request, validates JSON, normalizes whitespace.
- Safety pipeline runs: scans for crisis keywords. This message has none, so safety does not trigger. (If it had, we would skip to a pre-approved safety template and never call the LLM.)
- Backend calls engine.analyze() at the engine interface boundary.
- Engine v0 builds a prompt, calls Azure OpenAI (gpt-4.1-mini in East US).
- The LLM responds with structured JSON containing emotion, valence, intensity, tension, intent, response_strategy, conversation_dynamics, all flags, confidence, etc.
- Engine v0 validates the JSON against the strict Pydantic schema. If malformed or low confidence, fallback fires with
fallback_triggered: true, fallback_reason: "parse_error". - Backend writes 3 events: Message Event, Analysis Event, and (if /respond was called) Response Event.
- Backend returns the structured analysis to the frontend.
- Demo app renders the emotional state visualization next to the chat bubble.
- The user sees the visualization and can click thumbs-up/down to record a Feedback Event.
That is one round trip through the entire Phase 1 system. Every component in the architecture earned its place in those 12 steps.
1.7 The technology stack at a glance
| Layer | Tool | Purpose |
|---|---|---|
| Frontend | React (MIT) | Public demo app and admin internal tools |
| API Gateway | Kong OSS (Apache 2.0) | API key auth, rate limiting, routing |
| Backend Framework | FastAPI (MIT) | Python async web framework for /v1 endpoints |
| Schema Validation | Pydantic (MIT) | Typed request/response validation |
| Event Store | MongoDB (changing to PostgreSQL) | Persists all 7 event types |
| Cache | Redis (changing to Valkey) | Caches engine outputs by normalized input hash |
| Async Queue | RabbitMQ (MPL 2.0) | Background jobs (exports, simulator runs) |
| AI Gateway | LiteLLM or Portkey (MIT OSS) | Abstracts LLM providers, retries, fallbacks |
| LLM Provider | Azure OpenAI (gpt-4.1-mini) | The actual model behind Engine v0 |
| Cloud Hosting | AWS (EC2, ECS, S3, CloudWatch) | Where the system runs |
| IaC | Terraform (changing to OpenTofu) | Infrastructure provisioning |
| CI/CD | GitHub Actions | Automated testing and deployment |
| Local Dev | Docker plus Docker Compose | Reproducible local environment |
| Metrics | Prometheus plus CloudWatch | Latency, error rate, fallback rate, escalation rate |
| Baseline Eval | VADER Sentiment (MIT) | Simple sentiment baseline for the prototype eval |
| Testing | pytest (MIT) | Backend unit, integration, schema tests |
1.8 Why three stack components are being changed (technical reasoning)
MongoDB to PostgreSQL with JSONB
For the EAII workload (Message events, Analysis events, Response events), Postgres + JSONB handles flexible-document shapes just as well as Mongo, plus you get real SQL joins, transactions, and a single store for both event data and any structured metadata. JSONB query syntax (WHERE payload->>'emotion' = 'frustrated') is straightforward.
Redis to Valkey
Valkey is an Apache 2.0 fork of Redis maintained by the Linux Foundation, drop-in API-compatible. No code changes, same protocol, same client libraries, same performance characteristics.
Terraform to OpenTofu
OpenTofu is an open-source fork of Terraform. Same HCL syntax, same provider ecosystem. Drop-in replacement for the IaC layer.
The technical case for these swaps is clean: same or better functionality for the EAII workload, no operational regression. They are pragmatic choices.
1.9 The engine boundary (where Phase 1 ends, Phase 2 begins)
This is the single most important architectural concept in the whole SOW. Pivot is building Phase 1 with a stable interface around the engine:
engine.analyze(canonical_input) -> analysis_output engine.respond(canonical_input, analysis_output?) -> response_output
- The FastAPI route does NOT change.
- The event writer does NOT change.
- The Demo App does NOT change.
- The Debugger does NOT change.
- The Labeling tool does NOT change.
Only the file engine.py changes. The function signature stays the same. This is what makes the entire system "future-proof" against the Phase 2 transition.
Technical mechanisms enforcing this boundary:
- The engine interface contract (§3.3.9): two stable function signatures.
- Engine version routing (§3.3.9): server default
DEFAULT_ENGINE_VERSIONplus optionalX-Engine-Versionheader for internal-tool overrides. - Microservice readiness (§3.7.7): engine isolation boundary required, horizontal scaling readiness.
1.10 What success looks like for Phase 1
When Pivot finishes, you should have:
- A FastAPI service running in AWS staging and production with /v1/emotions/analyze, /v1/emotions/respond, /health, /ready endpoints.
- A working Investor Demo App at a public URL where Patrick can take an investor through a live conversation and see the emotional visualization update in real time.
- An admin internal tools site where the team can debug interactions, see dashboards, label data, run the simulator, and use the research playground.
- An event store containing every Message, Analysis, Response, and Feedback event in canonical envelope format.
- A safety pipeline that deterministically intercepts crisis-pattern inputs.
- An evaluation summary document showing the EAII structured output beats simple sentiment on illustrative scenarios.
- A handover package: runbooks, ADRs, env vars, deployment instructions, an "engine replacement guide" describing how Phase 2 will plug in.
2.1 The architecture, drawn
Investor Demo App (React)
Landing | Sign-in | Main Demo (chat plus visualization)
Kong API Gateway (planned)
Validates API key, applies rate limit, routes
FastAPI Service
Input Validation and Normalization
Whitespace trim, max length, type check
SAFETY PIPELINE (deterministic override)
Crisis keyword/regex check. If trigger: return safety template. LLM is never called.
Engine Interface (stable signatures)
engine.analyze(input) -> outputengine.respond(input, analysis?) -> response
Engine v0 (thin LLM wrapper)
Calls Azure OpenAI. Validates JSON via Pydantic. Falls back to neutral on parse error or low confidence. Caches by normalization_hash for demo stability.
Event Writer
Builds canonical envelope, stamps timestamps, attaches request_id/trace_id, persists. Fail equals 5xx (no silent success).
7 event types
norm_hash to output
exports, sim runs
Internal Tools
Debugger | Dashboard | Labeling | Simulator | Research Playground
- Kong is the planned implementation, not contractually required. SOW only requires the functions (API key validation, rate limiting). If we move off Kong, nothing in §3.3 breaks.
- Safety runs BEFORE the engine, not after. When a crisis keyword fires, the LLM is never called. No improvisation on suicide-adjacent input.
- Redis/Valkey (cache) and RabbitMQ (queue) are sidecar services, not in the main request path.
2.2 The 12 deliverable groups, deeper
1. Investor Demo App (§3.1)
A public-facing React web app for showing the system to investors.
Required screens: Landing/Onboarding (with disclaimer "structure and flow, not 'true emotional intelligence'"), Sign Up / Sign In, Main Demo (chat input, message stream, emotional visualization, generated response, feedback controls).
Visualization shows: emotion, valence, intensity, tension, intent, response_strategy, conversation_dynamics, receptivity_signal, key flags, escalation_flag.
Error states required: 401, 403, 429, 5xx must show non-blank UI with retry path.
2. Internal Tools (§3.2)
Single authenticated admin web app with 5 sub-tools using "list/table plus right-side detail drawer" UI pattern.
Debugger: search interactions by session_id/message_id/time-range plus filters across all taxonomy fields. Inspects full Message to Analysis to Response trace including raw input, optional base_llm_response, full taxonomy with confidence, operational signals, fallback status, model trace, evidence spans, latency, linked event IDs. Re-run with stored config.
Dashboard: KPIs (total messages, percent fallback, avg confidence, error rate, escalation rate), distributions across all taxonomy fields and flags, top fallback reasons, latency p50/p95/p99. All metrics computed strictly from stored events.
Labeling: annotators correct full taxonomy. Saves create Label Events.
Simulator (SEUA): generates synthetic messages (presets: stressed, angry, confused, enthusiastic). Default mix skews toward crisis-adjacent / safety-edge cases. Single-turn, multi-turn, and load-test modes. Phase 1 load targets: 50 concurrent sessions, 10 messages/sec, p95 latency under 500ms.
Research Playground: try inputs, compare two engine versions/configs side-by-side, export results.
3. Modular Backend / API Service (§3.3)
Containerized FastAPI service. The backbone.
- Health: /health (200 if process up), /ready (200 only if DB plus event persistence reachable).
- Public versioned API: all under /v1, immutable namespace, breaking changes require /v2.
- Validation/normalization on every request.
- Response envelope: request_id (UUID), api_version, engine_version.
- Auth: API keys in header, hashed storage with metadata.
- Rate limiting: keyed by API key, 429 with stable error code.
- Logging: request_id propagated through logs and event writes.
- Event Writer: if persistence fails, API call returns 5xx (no silent success).
- Engine routing: DEFAULT_ENGINE_VERSION env var, optional X-Engine-Version override header.
- Privacy: session_ids stored as HMAC, raw text not in standard request logs.
4. Event Schema and Data Pipeline (§3.4)
The data foundation. Every analysis flows into events; every internal tool reads from events.
Canonical envelope (every event): event_id, event_type, schema_version, occurred_at, ingested_at, environment, request_id, session_id (HMAC), conversation_id, actor (user/assistant/external_llm), client_context, engine_context, causal links (message_event_id, analysis_event_id).
7 event types: Message, Analysis, Response, Feedback, Label, External-LLM Output, External-LLM Scoring.
§3.4.4 Data Asset Ownership: all events, analysis outputs, feedback records, labeling results are proprietary Client property.
5. Engine v0 (§3.5)
The placeholder engine. Thin, LLM-based, disposable behind the stable interface.
What it does: takes user text, builds a prompt asking an LLM to produce JSON in the taxonomy shape, validates the JSON with Pydantic, applies fallback rules. Roughly 200 to 500 lines of Python.
Wrapper rules: deterministic input normalization, strict JSON schema validation, fallback to neutral on parse error or low confidence, cache by normalization_hash, golden-set regression runner.
6. Safety Framework (§3.6)
Deterministic, non-ML safety override pipeline. Runs BEFORE the engine.
- Normalize the input text.
- Check crisis keywords plus regex.
- Assign severity: critical / high / moderate.
- Force predefined safety state, return pre-approved template (NOT free-form generation).
- Record fallback_triggered=true, fallback_reason="safety_keyword", safety_severity.
Why non-ML: liability. ML systems can be unpredictable on edge cases. Deterministic keyword matching is auditable.
7. Infrastructure (§3.7)
Local dev (Docker Compose), staging plus prod deploy, secrets management, observability (CloudWatch), DB migrations, retention policies, smoke tests.
Microservice readiness (§3.7.7): engine isolation boundary, engine version toggles, horizontal scaling (stateless API, sessions in DB).
Retention rules: raw text 90 days, analysis events 24 months, safety-flagged 12 months. Deletion certificate within 10 business days of request.
8. Developer Interface (§3.8)
OpenAPI spec for /v1 endpoints plus a Python shadow-mode reference example.
Reference example: demonstrates partner system keeping its own response unchanged while sending user message to /v1/emotions/analyze in parallel for shadow analysis. Logs EAII output but does not show it.
9. Documentation and Handover (§3.9)
The package that lets your team run the system without Pivot. Includes runbooks, env vars docs, deploy instructions, ADRs, engine replacement guide (Phase 2 transition document), output taxonomy rationale, third-party inventory, Phase 2 considerations.
10. UI Design Pattern Selection (§3.10)
Pivot proposes 3 design pattern tiles (palette, typography, components). Client picks within 5 business days or supplies own style guide.
11. Prototype Evaluation Framework (§3.11)
Lightweight evaluation. NOT academic benchmarking. Synthetic corpus from SEUA, ~200 to 500 conversations.
Baselines: (a) sentiment-only delta model like VADER, (b) LLM-based per-turn classifier without structured state memory.
Deliverable: investor-readable evaluation summary, NOT a benchmark paper.
12. External-LLM-Output Scoring (§3.12)
Additive workstream (added Apr 28). Score the output of another LLM (assistant/model) for tone, calibration, escalation risk, defensiveness, hedging, receptivity, alignment.
Schema extends actor.type to {user, assistant, external_llm}. Backward-compatible, no /v1 breaking changes.
2.3 Key technical concepts, simplified
LLM (Large Language Model)
Models like GPT-4, Claude, Llama. Trained to predict the next token. Used here for both classification (assigning labels to the taxonomy) and generation (producing response text).
RAG (Retrieval-Augmented Generation)
The pattern where you don't put everything in the prompt. You retrieve relevant chunks from a database and add them to the prompt. The Phase 1 SOW does NOT use RAG explicitly. Engine v0 is a thin LLM wrapper, not a RAG system. (Phase 2 might use RAG-adjacent ideas, but that is Phase 2.)
Structured output / JSON Schema validation
LLMs by default return free text. Modern LLMs (GPT-4o, Claude) can return JSON conforming to a schema. The SOW requires "strict JSON schema validation on model output." Concretely: Engine v0 builds a Pydantic schema, calls the LLM with structured-output mode, and runs strict validation on top to catch any drift. If validation fails, fallback fires.
Shadow mode
Deploy a new system alongside an existing one without affecting the user-facing path. The new system gets the same inputs, produces outputs, but those outputs are observed/logged, not used. The Python reference example in §3.8.1 is a shadow-mode integration pattern.
Microservice readiness / horizontal scaling
The API service is stateless: sessions persisted in DB, not in memory. So you can run N copies of the API service behind a load balancer, and any copy can handle any request. /health and /ready endpoints let load balancers know which copies are alive.
Concretely: when 5 FastAPI containers run behind an AWS load balancer, request 1 might hit container A, request 2 might hit container B for the same conversation. If session state were in container memory, container B would not know about A's state, broken. Putting session_id state in DB means any container can resume any conversation.
Pydantic (deeper)
Python library for declaring typed schemas:
class AnalysisOutput(BaseModel):
emotion: str
valence: Literal["positive", "neutral", "negative"]
intensity: Literal["low", "medium", "high"]
confidence: float
# At validation time:
parsed = AnalysisOutput.model_validate_json(raw_llm_response)
# Throws ValidationError if the LLM returned malformed JSON
# or a value outside the allowed Literal range.
FastAPI integrates natively, so the same class is your API contract, your validation, AND your auto-generated docs.
Postgres with JSONB
Postgres has a JSON column type (JSONB) that stores JSON natively, indexable via GIN indexes, queryable with operators:
CREATE TABLE events ( id UUID PRIMARY KEY, event_type VARCHAR(50), occurred_at TIMESTAMP, payload JSONB ); SELECT * FROM events WHERE event_type = 'analysis' AND payload->>'emotion' = 'frustrated' AND (payload->>'overall_confidence')::float > 0.8;
You get Postgres transactions and joins (which Mongo struggles with) plus document flexibility.
Cache (Redis/Valkey)
Engine v0 caches normalization_hash to output. Why: LLMs are non-deterministic. For an investor demo where Patrick types the same example three times, three different outputs is bad theater. Caching makes the demo deterministic. Trade-off: caching can mask bugs. Re-run in the Debugger handles this by exposing cache_status: hit | miss.
Crisis keyword override
The Safety Framework (§3.6) is a deterministic, non-ML pipeline that scans user input for crisis keywords (e.g., suicide-adjacent language) and forces a pre-approved safety template response, bypassing the LLM. This exists for liability, you don't want an LLM improvising on suicide-adjacent input.
Golden set regression runner
A "golden set" is a curated set of inputs with known expected outputs. The runner runs all of them through the engine periodically and compares outputs to expected. If outputs change, the runner alerts. Used to detect drift when prompts, models, or configs change.
Architectural Decision Record (ADR)
A short doc explaining one architectural choice: what was decided, what alternatives were considered, why this one was picked, what the trade-offs are. The SOW requires an ADR pack at handover.
2.4 Engine v0 walkthrough (the disposable LLM wrapper)
Engine v0 is roughly:
def analyze(canonical_input):
text = canonical_input.text
# Deterministic normalization
normalized = trim_and_collapse_whitespace(text)
norm_hash = hash(normalized)
# Cache check (for demo stability)
if cache.has(norm_hash):
return cache.get(norm_hash)
# Build prompt
prompt = render_prompt(
prompt_version="v0.1",
text=normalized,
recent_context=canonical_input.context
)
# Call LLM via AI gateway
raw_response = ai_gateway.chat(
provider="azure_openai",
model="gpt-4.1-mini",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
# Validate output against schema
try:
parsed = AnalysisOutput.model_validate_json(raw_response)
except ValidationError as e:
return fallback_response(reason="parse_error", error=e)
# Confidence gating
if parsed.overall_confidence < CONFIDENCE_THRESHOLD:
return fallback_response(reason="low_confidence")
# Stamp trace fields
parsed.trace.model_id = "gpt-4.1-mini"
parsed.trace.prompt_version = "v0.1"
parsed.trace.normalization_hash = norm_hash
parsed.trace.engine_version = "v0"
parsed.trace.engine_config_id = "v0:strict-thresholds"
parsed.trace.cache_status = "miss"
parsed.trace.latency_ms = elapsed_ms()
# Cache for stability
cache.set(norm_hash, parsed)
return parsed
That is roughly 30 lines of pseudocode capturing what Engine v0 actually does. Not magical, not ML, just disciplined glue around an LLM call.
In Phase 2, Engine v2.0 will replace this file. The analyze function will load a fine-tuned model with torch.load() and run inference locally instead of calling Azure. Same function signature, same return shape, same caller.
3.1 The taxonomy fields, deeply
The shape of every Analysis Event. Approximately 30 fields organized into 10 layers.
What is the user feeling?
emotioncategorical label (NOT YET DEFINED, controlled vocabulary)valencepositive | neutral | negative (sign of emotion)intensitylow | medium | high (magnitude)tensioncalm | tense (bodily/psychological tension)
| Utterance | valence | intensity | tension |
|---|---|---|---|
| "I'm fine, just leave me alone." | negative | low | tense |
| "OH MY GOD I LOVE THIS!!!" | positive | high | calm |
| "I really, deeply need you to listen right now." | negative | medium | tense |
| "yeah whatever" | negative | low | calm |
| "Why are we even talking about this." | negative | medium | tense |
Memorable framing: Valence equals sign. Intensity equals volume. Tension equals body-state / confrontational pressure underneath. Three orthogonal axes; you can be intensely calm or low-intensity tense.
What does the user want?
intentcommunicative-need label (NOT YET DEFINED)response_strategywhat the system should do (NOT YET DEFINED)
conversation_dynamicsstable | escalating | de_escalating | unresolved | resolving
confusion_flaguser appears confusedurgency_flaguser is in a hurry / time-criticaloverload_flaguser is cognitively overwhelmedsafety_flagsafety/crisis patternrepetition_flaguser is repeating themselves
receptivity_signalhow open the user is to engagementdisengagement_flaguser checking out / leavingtesting_flag(optional) user testing the systemvulnerability_flag(optional) user in vulnerable state
Why these matter: §3.8.1 calls receptivity_signal and disengagement_flag "key commercial signals." A partner can fire disengagement_flag=true to a save-the-customer routing rule, or receptivity_signal=low to soften a sales pitch. The most concrete "why pay for EAII" fields.
escalation_flagescalate to human?response_modeneutral | supportive | informational
response_mode vs response_strategy: strategy equals WHAT to do (validate, clarify, redirect, de-escalate); mode equals HOW to sound (neutral, supportive, informational). Independent. You can have strategy=validate, mode=informational (validate factually) or strategy=validate, mode=supportive (validate warmly).
confidenceoverall float 0 to 1overall_confidencemust equal confidence (compat)confidence_by_fieldper-field confidence object
keywordsmax 5 keywords driving the analysisevidence_spansmax 3 spans with reason
model_idwhich model produced thisprompt_versionwhich promptnormalization_hashhash of normalized inputengine_versionwhich engine version (e.g. v0, v0.1)engine_config_idwhich runtime config (e.g. v0:strict-thresholds)cache_statushit/misslatency_mshow long it took
fallback_triggeredboolfallback_reasonstring/enum: low_confidence | parse_error | safety_keyword
Same fallback_triggered=true field is used for two very different cases. Distinguish by fallback_reason:
- fallback_reason="low_confidence" or "parse_error": Engine v0 could not produce valid output. Returned neutral baseline. Engine WAS called.
- fallback_reason="safety_keyword": Safety pipeline hit. LLM was bypassed entirely. Pre-approved template returned. Engine was NEVER called.
Same flag, fundamentally different operational meaning.
3.2 The 7 event types
Message Event
Raw user input, source-of-truth. Even empty/over-length/unsupported-language inputs still emit one with appropriate flag.
Analysis Event
Full taxonomy plus explainability plus trace plus operational signals plus confidence plus fallback. Failed analysis still emits with status="failed".
Response Event
What the system returned plus which template/strategy. Failed response still emits with status="failed", response_type="fallback".
Feedback Event
User feedback: emotion alignment plus response helpfulness. Linked to specific analysis_event_id and response_event_id.
Label Event
Annotator's correction (used for future training data).
External-LLM Output Event
Captures the output of an external LLM (not a user message). Linked to triggering user message.
External-LLM Scoring Event
Result of EAII scoring an external LLM's output (tone/calibration/escalation-risk/receptivity/alignment).
3.3 Causal links (how Debugger reconstructs a session)
Message Event { event_id: M1, message_id: msg-abc }
|
| message_event_id=M1
v
Analysis Event { event_id: A1, message_event_id: M1 }
|
| message_event_id=M1, analysis_event_id=A1
v
Response Event { event_id: R1, message_event_id: M1, analysis_event_id: A1 }
|
| analysis_event_id=A1, response_event_id=R1
v
Feedback Event { analysis_event_id: A1, response_event_id: R1 }
This is also what makes the export endpoint work: pull all events with the same session_id, reassemble by causal IDs.
3.4 The two endpoints flow
POST /v1/emotions/analyze: text -> analysis (taxonomy, no response generated) POST /v1/emotions/respond: text + optional analysis -> 1. If analysis not provided, run engine.analyze() first 2. Run safety check; if triggered, return safety template 3. Pick response_strategy + response_mode from analysis 4. Generate response text via LLM (or template) 5. Return both analysis AND response
Why two endpoints: a shadow-mode partner only needs /analyze (they keep their own response). A full integration uses /respond to get EAII-generated text.
3.5 Sample Message Event shape
{
"event_id": "evt-001",
"event_type": "message",
"schema_version": "1.0",
"occurred_at": "2026-05-08T14:30:00Z",
"ingested_at": "2026-05-08T14:30:00.123Z",
"environment": "production",
"request_id": "req-abc-123",
"session_id": "ses-hmac-xyz",
"conversation_id": "conv-456",
"actor": { "type": "user", "id": "user-hmac-pseudo" },
"client_context": { "platform": "web" },
"engine_context": {
"engine": "v0",
"version": "0.1",
"config_hash": "..."
},
"message_id": "msg-001",
"text": "I'm so frustrated, nothing is working",
"normalized_text": "i'm so frustrated, nothing is working",
"language_hint": "en",
"input_modality": "text",
"text_hash": "sha256-abc",
"contains_sensitive_markers": false
}
3.6 Technical questions David might raise (and the answers ready)
David's framing of "are you sure you can handle this?" will probably come as specific technical questions. Drill these.
"Have you thought about how the safety pipeline integrates with the engine call?"
"Do you understand how the engine interface stays stable across version swaps?"
engine.analyze(canonical_input) -> analysis_output and engine.respond(canonical_input, analysis_output?) -> response_output. The FastAPI route calls these regardless of which engine is loaded. v0 implements them with a thin LLM wrapper. v2.0 (Phase 2) will implement them with a fine-tuned model. The route, event writer, demo app, and debugger all sit on the public side of this boundary and never change."How are you handling the event write guarantees? What happens if the DB is down?"
"How does horizontal scaling work? Is the API stateless?"
"What is your story on cache stability for demos vs cache-masked-bugs?"
normalization_hash to output. Makes the demo deterministic (same prompt, same answer). Trade-off: bugs can be masked by stale cache. The Debugger surfaces cache_status: hit | miss in the trace; re-run in the Debugger always invokes the engine fresh. We can also disable the cache via config for test runs."How do you handle engine_version and engine_config_id mechanics?"
"What about the data retention rules in §3.7.9?"
"How does synthetic corpus generation work? What is the provenance trail?"
"How will you measure that the structured taxonomy beats simple sentiment?"
"How does External-LLM-output scoring extend the schema without breaking /v1?"
"What is your story on the controlled emotion / intent / response_strategy label sets?"
"How are you handling LLM provider isolation? What if Azure rate-limits you?"
"What happens to a Message Event for an empty input or over-length input?"
"How do you handle the case where Engine v0 returns malformed JSON?"
"How does the Debugger reconstruct a multi-event session?"
3.7 Glossary
Final Cheat Sheet (re-skim Sunday night)
- Phase 1 vs Phase 2: Phase 1 is the platform around the engine; Phase 2 is the engine itself. Engine interface boundary makes the swap clean.
- The 12 deliverables: Demo App, Internal Tools, Backend/API, Event Pipeline, Engine v0, Safety, Infra, Developer Interface, Docs, UI Pattern, Eval Framework, External-LLM Scoring.
- The 9 milestones: M1 (architecture lock) to M9 (handover). Roughly M3 to M5 right now.
- The architecture spine: Demo App, Kong, FastAPI (validate, normalize, safety, engine, event writer), Mongo+Redis+Rabbit, Internal Tools.
- The taxonomy: 30 fields in 10 layers. Affective (emotion, valence, intensity, tension), communicative need (intent, response_strategy), conversation_dynamics, 5 orthogonal flags, relational (receptivity, disengagement), operational (escalation, response_mode), confidence, explainability, trace, fallback.
- The 7 event types: Message, Analysis, Response, Feedback, Label, External-LLM Output, External-LLM Scoring. Linked by causal IDs.
- The taxonomy gap: emotion / intent / response_strategy label sets NOT YET DEFINED.
- The two endpoints: /v1/emotions/analyze (taxonomy only) and /v1/emotions/respond (analyze plus generate response).
- Safety pipeline runs BEFORE the engine. Crisis keywords bypass the LLM entirely.
- Stateless API service: sessions in DB, horizontal scaling, /health and /ready for load balancers.
- Two flavors of fallback: low_confidence / parse_error (engine called, returned neutral) vs safety_keyword (engine never called, template returned).
- engine_version vs engine_config_id: version is which engine implementation; config_id is which runtime config within. Both stored in trace for reproducibility.
- Caching: Engine v0 caches normalization_hash to output for demo stability. Re-run in Debugger always invokes fresh.
- The 15 likely David questions in §3.6 above. Drill those Sunday.
End of guide. Now go drill it Friday.