SOW Technical Study Guide

What Pivot is building, in deep technical detail. Five layers of depth, each building on the last. By Sunday night you should know the technology of this contract cold.

The 60-second elevator version

This is the absolute essence. If you only had one minute to read this whole guide, this is what you would walk away with. Every other tab is layered detail under these five sections.

0.1 What EAII is

Human Discovery Inc is building EAII (Emotional Artificial Intelligence Infrastructure), a platform that gives AI systems a structured understanding of user emotional state. Today's LLMs respond to messages but have no representation of what the user is feeling, what they need, or whether they are engaged. EAII produces a structured emotional state representation (emotion, valence, intensity, tension, intent, response strategy, conversation dynamics, plus operational signals like escalation and disengagement) that downstream AI applications can consume through a standardized API.

0.2 What Pivot is building

Pivot-al AI is building Phase 1 of EAII: the platform around the engine, NOT the real engine itself. Pivot delivers an API service, a data pipeline, a placeholder Engine v0 (a thin LLM wrapper using Azure OpenAI), an investor-facing demo app, internal admin tools (debugger, dashboard, labeling, simulator, research playground), a deterministic safety pipeline for crisis-pattern inputs, cloud infrastructure on AWS, an evaluation framework, and a documentation handover package.

0.3 Why Phase 1 is intentionally not the real engine

Phase 2 is where the real emotional intelligence engine gets built (your domain). Phase 1 deliberately uses a placeholder engine (Engine v0, a thin LLM wrapper) so the architecture, schema, API, and infrastructure can be locked while Phase 2 designs the actual model. The whole system is built around a stable "engine interface boundary" so that swapping the placeholder for the real engine in Phase 2 does not require changing the API, the event store, the demo app, or the internal tools.

0.4 The single most important concept

The architecture has a swap point. Two function signatures:

engine.analyze(canonical_input) -> analysis_output
engine.respond(canonical_input, analysis_output?) -> response_output

Engine v0 (Phase 1, thin LLM wrapper) implements these. Engine v2.0 (Phase 2, real ML model) will implement the same signatures. Everything else in the system calls these functions and never knows the difference between v0 and v2.0. That is what makes the whole architecture future-proof.

0.5 What success looks like

When Phase 1 ends, Human Discovery has a running platform on AWS with public API endpoints, an investor demo where Patrick can show structured emotional analysis live to investors, internal admin tools for debugging and iterating, a synthetic evaluation corpus showing the structured output beats simple sentiment, and a complete handover package for taking the system over.

That is the whole picture in 60 seconds

When you are ready for more shape and proportion, go to Tab 2 (100,000 ft). When you are ready for the full architecture and 12 deliverable groups, go to Tab 3 (Macro).

The 5-minute summary

The 200k view is the elevator pitch. This is the one-page brief. Adds shape and proportions without diving into specifics.

0.6 EAII in context

The bet behind EAII: modern LLMs are stateless about emotion. They can sound empathetic in any single response but have no representation of where the user is in an emotional trajectory across turns. EAII fills that gap as an infrastructure layer: an API any AI system can call to get a structured emotional state on each user message. Anthropic's April 2026 interpretability research is supporting evidence that emotion concepts are a meaningful internal representation in modern models; EAII externalizes that representation into a standardized format any downstream AI can consume programmatically.

0.7 The two-phase build

Phase 1 (Pivot, 5 months)

  • The PLATFORM around the engine
  • Disposable Engine v0
  • LLM wrapper on Azure OpenAI
  • Investor-ready alpha
to

Phase 2 (you, 12+ months)

  • The ENGINE itself
  • Real ML engine (Engine v2.0)
  • Fine-tuned models, structured state
  • Production system

Phase 1 ends. Phase 2 begins. The architecture is built so the transition is just swapping the engine file behind a stable interface, not redesigning anything around it.

0.8 The 12 deliverable groups (just the list)

  1. Investor Demo App (public-facing React app)
  2. Internal Tools (admin web app: 5 sub-tools)
  3. Modular Backend / API Service (FastAPI)
  4. Event Schema and Data Pipeline (canonical envelope, 7 event types)
  5. Engine v0 (placeholder thin LLM wrapper)
  6. Safety Framework (deterministic crisis-keyword override)
  7. Infrastructure (AWS, Docker, observability, retention)
  8. Developer Interface (OpenAPI spec, Python shadow-mode reference)
  9. Documentation and Handover (runbooks, ADRs, replacement guide)
  10. UI Design Pattern Selection (3 design tiles)
  11. Prototype Evaluation Framework (synthetic corpus, baselines, summary)
  12. External-LLM-Output Scoring (additive workstream, added Apr 28)

0.9 The architecture in 3 boxes

Step 1

FRONTEND

Demo App + Admin Tools
(React)

Step 2

BACKEND

FastAPI + Engine v0
(Python)

Step 3

STORAGE

Events DB + Cache + Async Queue

The frontend is two React apps (public demo, internal tools). The backend is a containerized FastAPI service that does input validation, runs a safety check, calls the engine, and writes events. The storage layer holds events, cache, and async queue. All of it runs on AWS.

0.10 The timeline at a glance

PhaseWeeksWhat
Architecture lockWeek 1API payloads, event schema, UI wireframes
FoundationWeeks 1 to 6Backend, event store, Engine v0, safety, demo alpha
Internal toolsWeeks 4 to 9Debugger, dashboard, labeling, simulator, playground
HardeningWeeks 10 to 12QA, staging, production verification
HandoverWeeks 9 to 14Documentation, ADRs, engine replacement guide

About 2 months in, the team is roughly between foundation and internal tools.

0.11 The technology stack at a glance

Python (FastAPI) for the backend. React for both frontends. PostgreSQL for the event store (originally MongoDB, swapping for technical reasons). Valkey for cache (originally Redis, drop-in replacement). RabbitMQ for async queue. Azure OpenAI as the LLM provider (gpt-4.1-mini in East US). AWS for cloud hosting (EC2, ECS, S3, CloudWatch). Docker plus Docker Compose for local dev. GitHub Actions for CI/CD. OpenTofu for infrastructure-as-code (originally Terraform, drop-in replacement).

0.12 Why it matters strategically

The team gets two things from Phase 1:

  1. A working investor-facing demo that lets Patrick show structured emotional understanding live to investors.
  2. A locked architecture (API, schema, event pipeline, internal tools) that Phase 2 plugs into without redesign.

The "Phase 1 is disposable" framing is intentional: the placeholder engine gets thrown away. The platform around it does NOT.

0.13 What you need to walk into Monday with

Confident command of: what each of the 12 deliverables is, the request flow through the system end to end, the engine boundary and how it makes the Phase 2 swap clean, the taxonomy shape (~30 fields in 10 layers), the 7 event types and how they link, the safety pipeline running before the engine, the technology stack and why three components are being changed, and the technical questions David might raise with answers ready. Those are all built out in Tabs 3, 4, and 5.

1.1 What Pivot is building, in one paragraph

Pivot is building the scaffolding for the EAII platform: an API service, a data pipeline, a placeholder "Engine v0" (a thin LLM wrapper), an investor-facing demo app, internal tools (debugger, dashboard, labeling, simulator, research playground), a deterministic safety pipeline, and the cloud infrastructure to run it all. The output of the system is a structured emotional state representation that downstream AI applications can consume through a standardized API. Phase 1 deliberately stops short of real emotional intelligence so Phase 2 (the actual engine) can plug in cleanly later.

1.2 Phase 1 vs Phase 2 (the most important framing)

Phase 1 (what Pivot is building)

  • The platform around the engine
  • "Plumbing"
  • Engine v0: thin LLM wrapper, disposable
  • 14 weeks core build, 5 months total
  • Investor-ready alpha
to

Phase 2 (the future, your domain)

  • The engine itself
  • "Brain"
  • Engine v2.0: real ML model
  • 12+ months
  • Production system

Phase 1's whole purpose is to build the infrastructure so Phase 2 can plug in cleanly. The SOW says this in §3.5.1, §3.7.7, and §9.1: Engine v0 is "explicitly replaceable and not a long-term EAII intelligence system." The real intelligence comes later.

Key insight

The technical complexity in Phase 1 is NOT in Engine v0. The complexity is in the surrounding infrastructure: the API, the event pipeline, the schema, the internal tools, the deployment story. Engine v0 is on purpose simple, just disciplined glue around an LLM call.

1.3 The 12 deliverable groups

1

Investor Demo App

Public-facing chat-with-emotional-visualization React app (§3.1)

2

Internal Tools

Admin web app: Debugger, Dashboard, Labeling, Simulator, Research Playground (§3.2)

3

Modular Backend / API

Containerized FastAPI service with /v1 endpoints (§3.3)

4

Event Schema and Data Pipeline

Canonical envelope plus 7 event types (§3.4)

5

Engine v0

The placeholder thin LLM wrapper (§3.5)

6

Safety Framework

Deterministic non-ML crisis-keyword override pipeline (§3.6)

7

Infrastructure

Environments, deploy, observability, retention (§3.7)

8

Developer Interface

OpenAPI spec plus Python shadow-mode reference (§3.8)

9

Documentation and Handover

Runbooks, ADRs, replacement guide (§3.9)

10

UI Design Pattern Selection

3 design tiles, you pick one (§3.10)

11

Prototype Evaluation Framework

Synthetic corpus, baselines, investor-readable summary (§3.11)

12

External-LLM-Output Scoring

Additive workstream evaluating assistant/model outputs (§3.12, added Apr 28)

1.4 The timeline (9 milestones, 14-week core build)

M1W1
Architecture lock
M2W1 to 3
DevOps baseline
M3W1 to 4
Backend foundation
M4W2 to 5
Engine v0
M5W4 to 6
Respond plus safety
M6W2 to 6
Demo App alpha
M7W4 to 9
Internal tools
M8W10 to 12
QA plus hardening
M9W9 to 14
Docs plus handover

~2 months in, we are most likely between M2 and M5 (orange-highlighted). M6 (Demo App alpha) and M7 (Internal Tools) are the biggest remaining workstreams.

1.5 The architecture in 30,000 ft view

USER ACTION (typing in the demo app)
        |
        v
[Frontend: React] sends JSON request with API key
        |
        v
[API Gateway: Kong] checks API key, applies rate limit
        |
        v
[Backend: FastAPI Service]
   1. Validates and normalizes input
   2. Runs Safety Pipeline check (BEFORE any LLM call)
   3. Calls engine.analyze() at the engine interface boundary
   4. Engine v0 calls Azure OpenAI, validates the JSON output
   5. Optionally calls engine.respond() to generate a response
   6. Writes Message Event, Analysis Event, Response Event
        |
        v
[Event Store: MongoDB or Postgres] persists every event
        |
        v
[Internal Tools] reads from event store
The key architectural property

The engine sits behind a stable interface (the swap point). Engine v0 today is a thin LLM wrapper. Engine v2.0 in Phase 2 will be a fine-tuned model. The interface signature stays identical, so swapping engines does not require changing the API, the event store, the demo app, or the internal tools.

1.6 What the system actually DOES (one user request, walked through)

A user types: "I'm so frustrated with this, nothing is working and I don't know what to do."

  1. Frontend sends POST to /v1/emotions/analyze with the message text and API key.
  2. Kong gateway validates the API key, applies rate limiting.
  3. FastAPI backend receives the request, validates JSON, normalizes whitespace.
  4. Safety pipeline runs: scans for crisis keywords. This message has none, so safety does not trigger. (If it had, we would skip to a pre-approved safety template and never call the LLM.)
  5. Backend calls engine.analyze() at the engine interface boundary.
  6. Engine v0 builds a prompt, calls Azure OpenAI (gpt-4.1-mini in East US).
  7. The LLM responds with structured JSON containing emotion, valence, intensity, tension, intent, response_strategy, conversation_dynamics, all flags, confidence, etc.
  8. Engine v0 validates the JSON against the strict Pydantic schema. If malformed or low confidence, fallback fires with fallback_triggered: true, fallback_reason: "parse_error".
  9. Backend writes 3 events: Message Event, Analysis Event, and (if /respond was called) Response Event.
  10. Backend returns the structured analysis to the frontend.
  11. Demo app renders the emotional state visualization next to the chat bubble.
  12. The user sees the visualization and can click thumbs-up/down to record a Feedback Event.

That is one round trip through the entire Phase 1 system. Every component in the architecture earned its place in those 12 steps.

1.7 The technology stack at a glance

LayerToolPurpose
FrontendReact (MIT)Public demo app and admin internal tools
API GatewayKong OSS (Apache 2.0)API key auth, rate limiting, routing
Backend FrameworkFastAPI (MIT)Python async web framework for /v1 endpoints
Schema ValidationPydantic (MIT)Typed request/response validation
Event StoreMongoDB (changing to PostgreSQL)Persists all 7 event types
CacheRedis (changing to Valkey)Caches engine outputs by normalized input hash
Async QueueRabbitMQ (MPL 2.0)Background jobs (exports, simulator runs)
AI GatewayLiteLLM or Portkey (MIT OSS)Abstracts LLM providers, retries, fallbacks
LLM ProviderAzure OpenAI (gpt-4.1-mini)The actual model behind Engine v0
Cloud HostingAWS (EC2, ECS, S3, CloudWatch)Where the system runs
IaCTerraform (changing to OpenTofu)Infrastructure provisioning
CI/CDGitHub ActionsAutomated testing and deployment
Local DevDocker plus Docker ComposeReproducible local environment
MetricsPrometheus plus CloudWatchLatency, error rate, fallback rate, escalation rate
Baseline EvalVADER Sentiment (MIT)Simple sentiment baseline for the prototype eval
Testingpytest (MIT)Backend unit, integration, schema tests

1.8 Why three stack components are being changed (technical reasoning)

MongoDB to PostgreSQL with JSONB

For the EAII workload (Message events, Analysis events, Response events), Postgres + JSONB handles flexible-document shapes just as well as Mongo, plus you get real SQL joins, transactions, and a single store for both event data and any structured metadata. JSONB query syntax (WHERE payload->>'emotion' = 'frustrated') is straightforward.

Redis to Valkey

Valkey is an Apache 2.0 fork of Redis maintained by the Linux Foundation, drop-in API-compatible. No code changes, same protocol, same client libraries, same performance characteristics.

Terraform to OpenTofu

OpenTofu is an open-source fork of Terraform. Same HCL syntax, same provider ecosystem. Drop-in replacement for the IaC layer.

The technical case for these swaps is clean: same or better functionality for the EAII workload, no operational regression. They are pragmatic choices.

1.9 The engine boundary (where Phase 1 ends, Phase 2 begins)

This is the single most important architectural concept in the whole SOW. Pivot is building Phase 1 with a stable interface around the engine:

engine.analyze(canonical_input) -> analysis_output
engine.respond(canonical_input, analysis_output?) -> response_output
When you swap engines
  • The FastAPI route does NOT change.
  • The event writer does NOT change.
  • The Demo App does NOT change.
  • The Debugger does NOT change.
  • The Labeling tool does NOT change.

Only the file engine.py changes. The function signature stays the same. This is what makes the entire system "future-proof" against the Phase 2 transition.

Technical mechanisms enforcing this boundary:

1.10 What success looks like for Phase 1

When Pivot finishes, you should have:

2.1 The architecture, drawn

User-facing

Investor Demo App (React)

Landing | Sign-in | Main Demo (chat plus visualization)

v
Gateway

Kong API Gateway (planned)

Validates API key, applies rate limit, routes

v
Backend

FastAPI Service

v
Step 1

Input Validation and Normalization

Whitespace trim, max length, type check

v
Step 2: BEFORE any LLM call

SAFETY PIPELINE (deterministic override)

Crisis keyword/regex check. If trigger: return safety template. LLM is never called.

v (no trigger)
Step 3: THE SWAP POINT

Engine Interface (stable signatures)

engine.analyze(input) -> output
engine.respond(input, analysis?) -> response

v
Step 4: disposable

Engine v0 (thin LLM wrapper)

Calls Azure OpenAI. Validates JSON via Pydantic. Falls back to neutral on parse error or low confidence. Caches by normalization_hash for demo stability.

v
Step 5

Event Writer

Builds canonical envelope, stamps timestamps, attaches request_id/trace_id, persists. Fail equals 5xx (no silent success).

v
Event StoreMongoDB to Postgres
7 event types
CacheRedis to Valkey
norm_hash to output
Async QueueRabbitMQ
exports, sim runs
v (read-only)
Admin

Internal Tools

Debugger | Dashboard | Labeling | Simulator | Research Playground

Diagram notes
  • Kong is the planned implementation, not contractually required. SOW only requires the functions (API key validation, rate limiting). If we move off Kong, nothing in §3.3 breaks.
  • Safety runs BEFORE the engine, not after. When a crisis keyword fires, the LLM is never called. No improvisation on suicide-adjacent input.
  • Redis/Valkey (cache) and RabbitMQ (queue) are sidecar services, not in the main request path.

2.2 The 12 deliverable groups, deeper

1. Investor Demo App (§3.1)

A public-facing React web app for showing the system to investors.

Required screens: Landing/Onboarding (with disclaimer "structure and flow, not 'true emotional intelligence'"), Sign Up / Sign In, Main Demo (chat input, message stream, emotional visualization, generated response, feedback controls).

Visualization shows: emotion, valence, intensity, tension, intent, response_strategy, conversation_dynamics, receptivity_signal, key flags, escalation_flag.

Error states required: 401, 403, 429, 5xx must show non-blank UI with retry path.

2. Internal Tools (§3.2)

Single authenticated admin web app with 5 sub-tools using "list/table plus right-side detail drawer" UI pattern.

Debugger: search interactions by session_id/message_id/time-range plus filters across all taxonomy fields. Inspects full Message to Analysis to Response trace including raw input, optional base_llm_response, full taxonomy with confidence, operational signals, fallback status, model trace, evidence spans, latency, linked event IDs. Re-run with stored config.

Dashboard: KPIs (total messages, percent fallback, avg confidence, error rate, escalation rate), distributions across all taxonomy fields and flags, top fallback reasons, latency p50/p95/p99. All metrics computed strictly from stored events.

Labeling: annotators correct full taxonomy. Saves create Label Events.

Simulator (SEUA): generates synthetic messages (presets: stressed, angry, confused, enthusiastic). Default mix skews toward crisis-adjacent / safety-edge cases. Single-turn, multi-turn, and load-test modes. Phase 1 load targets: 50 concurrent sessions, 10 messages/sec, p95 latency under 500ms.

Research Playground: try inputs, compare two engine versions/configs side-by-side, export results.

3. Modular Backend / API Service (§3.3)

Containerized FastAPI service. The backbone.

  • Health: /health (200 if process up), /ready (200 only if DB plus event persistence reachable).
  • Public versioned API: all under /v1, immutable namespace, breaking changes require /v2.
  • Validation/normalization on every request.
  • Response envelope: request_id (UUID), api_version, engine_version.
  • Auth: API keys in header, hashed storage with metadata.
  • Rate limiting: keyed by API key, 429 with stable error code.
  • Logging: request_id propagated through logs and event writes.
  • Event Writer: if persistence fails, API call returns 5xx (no silent success).
  • Engine routing: DEFAULT_ENGINE_VERSION env var, optional X-Engine-Version override header.
  • Privacy: session_ids stored as HMAC, raw text not in standard request logs.

4. Event Schema and Data Pipeline (§3.4)

The data foundation. Every analysis flows into events; every internal tool reads from events.

Canonical envelope (every event): event_id, event_type, schema_version, occurred_at, ingested_at, environment, request_id, session_id (HMAC), conversation_id, actor (user/assistant/external_llm), client_context, engine_context, causal links (message_event_id, analysis_event_id).

7 event types: Message, Analysis, Response, Feedback, Label, External-LLM Output, External-LLM Scoring.

§3.4.4 Data Asset Ownership: all events, analysis outputs, feedback records, labeling results are proprietary Client property.

5. Engine v0 (§3.5)

The placeholder engine. Thin, LLM-based, disposable behind the stable interface.

What it does: takes user text, builds a prompt asking an LLM to produce JSON in the taxonomy shape, validates the JSON with Pydantic, applies fallback rules. Roughly 200 to 500 lines of Python.

Wrapper rules: deterministic input normalization, strict JSON schema validation, fallback to neutral on parse error or low confidence, cache by normalization_hash, golden-set regression runner.

6. Safety Framework (§3.6)

Deterministic, non-ML safety override pipeline. Runs BEFORE the engine.

  1. Normalize the input text.
  2. Check crisis keywords plus regex.
  3. Assign severity: critical / high / moderate.
  4. Force predefined safety state, return pre-approved template (NOT free-form generation).
  5. Record fallback_triggered=true, fallback_reason="safety_keyword", safety_severity.

Why non-ML: liability. ML systems can be unpredictable on edge cases. Deterministic keyword matching is auditable.

7. Infrastructure (§3.7)

Local dev (Docker Compose), staging plus prod deploy, secrets management, observability (CloudWatch), DB migrations, retention policies, smoke tests.

Microservice readiness (§3.7.7): engine isolation boundary, engine version toggles, horizontal scaling (stateless API, sessions in DB).

Retention rules: raw text 90 days, analysis events 24 months, safety-flagged 12 months. Deletion certificate within 10 business days of request.

8. Developer Interface (§3.8)

OpenAPI spec for /v1 endpoints plus a Python shadow-mode reference example.

Reference example: demonstrates partner system keeping its own response unchanged while sending user message to /v1/emotions/analyze in parallel for shadow analysis. Logs EAII output but does not show it.

9. Documentation and Handover (§3.9)

The package that lets your team run the system without Pivot. Includes runbooks, env vars docs, deploy instructions, ADRs, engine replacement guide (Phase 2 transition document), output taxonomy rationale, third-party inventory, Phase 2 considerations.

10. UI Design Pattern Selection (§3.10)

Pivot proposes 3 design pattern tiles (palette, typography, components). Client picks within 5 business days or supplies own style guide.

11. Prototype Evaluation Framework (§3.11)

Lightweight evaluation. NOT academic benchmarking. Synthetic corpus from SEUA, ~200 to 500 conversations.

Baselines: (a) sentiment-only delta model like VADER, (b) LLM-based per-turn classifier without structured state memory.

Deliverable: investor-readable evaluation summary, NOT a benchmark paper.

12. External-LLM-Output Scoring (§3.12)

Additive workstream (added Apr 28). Score the output of another LLM (assistant/model) for tone, calibration, escalation risk, defensiveness, hedging, receptivity, alignment.

Schema extends actor.type to {user, assistant, external_llm}. Backward-compatible, no /v1 breaking changes.

2.3 Key technical concepts, simplified

LLM (Large Language Model)

Models like GPT-4, Claude, Llama. Trained to predict the next token. Used here for both classification (assigning labels to the taxonomy) and generation (producing response text).

RAG (Retrieval-Augmented Generation)

The pattern where you don't put everything in the prompt. You retrieve relevant chunks from a database and add them to the prompt. The Phase 1 SOW does NOT use RAG explicitly. Engine v0 is a thin LLM wrapper, not a RAG system. (Phase 2 might use RAG-adjacent ideas, but that is Phase 2.)

Structured output / JSON Schema validation

LLMs by default return free text. Modern LLMs (GPT-4o, Claude) can return JSON conforming to a schema. The SOW requires "strict JSON schema validation on model output." Concretely: Engine v0 builds a Pydantic schema, calls the LLM with structured-output mode, and runs strict validation on top to catch any drift. If validation fails, fallback fires.

Shadow mode

Deploy a new system alongside an existing one without affecting the user-facing path. The new system gets the same inputs, produces outputs, but those outputs are observed/logged, not used. The Python reference example in §3.8.1 is a shadow-mode integration pattern.

Microservice readiness / horizontal scaling

The API service is stateless: sessions persisted in DB, not in memory. So you can run N copies of the API service behind a load balancer, and any copy can handle any request. /health and /ready endpoints let load balancers know which copies are alive.

Concretely: when 5 FastAPI containers run behind an AWS load balancer, request 1 might hit container A, request 2 might hit container B for the same conversation. If session state were in container memory, container B would not know about A's state, broken. Putting session_id state in DB means any container can resume any conversation.

Pydantic (deeper)

Python library for declaring typed schemas:

class AnalysisOutput(BaseModel):
    emotion: str
    valence: Literal["positive", "neutral", "negative"]
    intensity: Literal["low", "medium", "high"]
    confidence: float

# At validation time:
parsed = AnalysisOutput.model_validate_json(raw_llm_response)
# Throws ValidationError if the LLM returned malformed JSON
# or a value outside the allowed Literal range.

FastAPI integrates natively, so the same class is your API contract, your validation, AND your auto-generated docs.

Postgres with JSONB

Postgres has a JSON column type (JSONB) that stores JSON natively, indexable via GIN indexes, queryable with operators:

CREATE TABLE events (
  id UUID PRIMARY KEY,
  event_type VARCHAR(50),
  occurred_at TIMESTAMP,
  payload JSONB
);

SELECT * FROM events
WHERE event_type = 'analysis'
  AND payload->>'emotion' = 'frustrated'
  AND (payload->>'overall_confidence')::float > 0.8;

You get Postgres transactions and joins (which Mongo struggles with) plus document flexibility.

Cache (Redis/Valkey)

Engine v0 caches normalization_hash to output. Why: LLMs are non-deterministic. For an investor demo where Patrick types the same example three times, three different outputs is bad theater. Caching makes the demo deterministic. Trade-off: caching can mask bugs. Re-run in the Debugger handles this by exposing cache_status: hit | miss.

Crisis keyword override

The Safety Framework (§3.6) is a deterministic, non-ML pipeline that scans user input for crisis keywords (e.g., suicide-adjacent language) and forces a pre-approved safety template response, bypassing the LLM. This exists for liability, you don't want an LLM improvising on suicide-adjacent input.

Golden set regression runner

A "golden set" is a curated set of inputs with known expected outputs. The runner runs all of them through the engine periodically and compares outputs to expected. If outputs change, the runner alerts. Used to detect drift when prompts, models, or configs change.

Architectural Decision Record (ADR)

A short doc explaining one architectural choice: what was decided, what alternatives were considered, why this one was picked, what the trade-offs are. The SOW requires an ADR pack at handover.

2.4 Engine v0 walkthrough (the disposable LLM wrapper)

Engine v0 is roughly:

def analyze(canonical_input):
    text = canonical_input.text

    # Deterministic normalization
    normalized = trim_and_collapse_whitespace(text)
    norm_hash = hash(normalized)

    # Cache check (for demo stability)
    if cache.has(norm_hash):
        return cache.get(norm_hash)

    # Build prompt
    prompt = render_prompt(
        prompt_version="v0.1",
        text=normalized,
        recent_context=canonical_input.context
    )

    # Call LLM via AI gateway
    raw_response = ai_gateway.chat(
        provider="azure_openai",
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    # Validate output against schema
    try:
        parsed = AnalysisOutput.model_validate_json(raw_response)
    except ValidationError as e:
        return fallback_response(reason="parse_error", error=e)

    # Confidence gating
    if parsed.overall_confidence < CONFIDENCE_THRESHOLD:
        return fallback_response(reason="low_confidence")

    # Stamp trace fields
    parsed.trace.model_id = "gpt-4.1-mini"
    parsed.trace.prompt_version = "v0.1"
    parsed.trace.normalization_hash = norm_hash
    parsed.trace.engine_version = "v0"
    parsed.trace.engine_config_id = "v0:strict-thresholds"
    parsed.trace.cache_status = "miss"
    parsed.trace.latency_ms = elapsed_ms()

    # Cache for stability
    cache.set(norm_hash, parsed)

    return parsed

That is roughly 30 lines of pseudocode capturing what Engine v0 actually does. Not magical, not ML, just disciplined glue around an LLM call.

In Phase 2, Engine v2.0 will replace this file. The analyze function will load a fine-tuned model with torch.load() and run inference locally instead of calling Azure. Same function signature, same return shape, same caller.

3.1 The taxonomy fields, deeply

The shape of every Analysis Event. Approximately 30 fields organized into 10 layers.

Layer 1: Affective dimensions

What is the user feeling?

  • emotion categorical label (NOT YET DEFINED, controlled vocabulary)
  • valence positive | neutral | negative (sign of emotion)
  • intensity low | medium | high (magnitude)
  • tension calm | tense (bodily/psychological tension)
Worked examples for valence vs intensity vs tension
Utterancevalenceintensitytension
"I'm fine, just leave me alone."negativelowtense
"OH MY GOD I LOVE THIS!!!"positivehighcalm
"I really, deeply need you to listen right now."negativemediumtense
"yeah whatever"negativelowcalm
"Why are we even talking about this."negativemediumtense

Memorable framing: Valence equals sign. Intensity equals volume. Tension equals body-state / confrontational pressure underneath. Three orthogonal axes; you can be intensely calm or low-intensity tense.

Layer 2: Communicative need

What does the user want?

  • intent communicative-need label (NOT YET DEFINED)
  • response_strategy what the system should do (NOT YET DEFINED)
Layer 3: Conversation dynamics
  • conversation_dynamics stable | escalating | de_escalating | unresolved | resolving
Layer 4: Orthogonal flags (5 booleans)
  • confusion_flag user appears confused
  • urgency_flag user is in a hurry / time-critical
  • overload_flag user is cognitively overwhelmed
  • safety_flag safety/crisis pattern
  • repetition_flag user is repeating themselves
Layer 5: Relational signals (the COMMERCIAL signals)
  • receptivity_signal how open the user is to engagement
  • disengagement_flag user checking out / leaving
  • testing_flag (optional) user testing the system
  • vulnerability_flag (optional) user in vulnerable state

Why these matter: §3.8.1 calls receptivity_signal and disengagement_flag "key commercial signals." A partner can fire disengagement_flag=true to a save-the-customer routing rule, or receptivity_signal=low to soften a sales pitch. The most concrete "why pay for EAII" fields.

Layer 6: Operational signals
  • escalation_flag escalate to human?
  • response_mode neutral | supportive | informational

response_mode vs response_strategy: strategy equals WHAT to do (validate, clarify, redirect, de-escalate); mode equals HOW to sound (neutral, supportive, informational). Independent. You can have strategy=validate, mode=informational (validate factually) or strategy=validate, mode=supportive (validate warmly).

Layer 7: Confidence
  • confidence overall float 0 to 1
  • overall_confidence must equal confidence (compat)
  • confidence_by_field per-field confidence object
Layer 8: Explainability
  • keywords max 5 keywords driving the analysis
  • evidence_spans max 3 spans with reason
Layer 9: Trace (debug plus audit)
  • model_id which model produced this
  • prompt_version which prompt
  • normalization_hash hash of normalized input
  • engine_version which engine version (e.g. v0, v0.1)
  • engine_config_id which runtime config (e.g. v0:strict-thresholds)
  • cache_status hit/miss
  • latency_ms how long it took
Layer 10: Fallback
  • fallback_triggered bool
  • fallback_reason string/enum: low_confidence | parse_error | safety_keyword
Two flavors of fallback (important distinction)

Same fallback_triggered=true field is used for two very different cases. Distinguish by fallback_reason:

  • fallback_reason="low_confidence" or "parse_error": Engine v0 could not produce valid output. Returned neutral baseline. Engine WAS called.
  • fallback_reason="safety_keyword": Safety pipeline hit. LLM was bypassed entirely. Pre-approved template returned. Engine was NEVER called.

Same flag, fundamentally different operational meaning.

3.2 The 7 event types

1

Message Event

Raw user input, source-of-truth. Even empty/over-length/unsupported-language inputs still emit one with appropriate flag.

2

Analysis Event

Full taxonomy plus explainability plus trace plus operational signals plus confidence plus fallback. Failed analysis still emits with status="failed".

3

Response Event

What the system returned plus which template/strategy. Failed response still emits with status="failed", response_type="fallback".

4

Feedback Event

User feedback: emotion alignment plus response helpfulness. Linked to specific analysis_event_id and response_event_id.

5

Label Event

Annotator's correction (used for future training data).

6

External-LLM Output Event

Captures the output of an external LLM (not a user message). Linked to triggering user message.

7

External-LLM Scoring Event

Result of EAII scoring an external LLM's output (tone/calibration/escalation-risk/receptivity/alignment).

3.3 Causal links (how Debugger reconstructs a session)

Message Event   { event_id: M1, message_id: msg-abc }
   |
   | message_event_id=M1
   v
Analysis Event  { event_id: A1, message_event_id: M1 }
   |
   | message_event_id=M1, analysis_event_id=A1
   v
Response Event  { event_id: R1, message_event_id: M1, analysis_event_id: A1 }
   |
   | analysis_event_id=A1, response_event_id=R1
   v
Feedback Event  { analysis_event_id: A1, response_event_id: R1 }

This is also what makes the export endpoint work: pull all events with the same session_id, reassemble by causal IDs.

3.4 The two endpoints flow

POST /v1/emotions/analyze:
   text -> analysis (taxonomy, no response generated)

POST /v1/emotions/respond:
   text + optional analysis ->
   1. If analysis not provided, run engine.analyze() first
   2. Run safety check; if triggered, return safety template
   3. Pick response_strategy + response_mode from analysis
   4. Generate response text via LLM (or template)
   5. Return both analysis AND response

Why two endpoints: a shadow-mode partner only needs /analyze (they keep their own response). A full integration uses /respond to get EAII-generated text.

3.5 Sample Message Event shape

{
  "event_id": "evt-001",
  "event_type": "message",
  "schema_version": "1.0",
  "occurred_at": "2026-05-08T14:30:00Z",
  "ingested_at": "2026-05-08T14:30:00.123Z",
  "environment": "production",
  "request_id": "req-abc-123",
  "session_id": "ses-hmac-xyz",
  "conversation_id": "conv-456",
  "actor": { "type": "user", "id": "user-hmac-pseudo" },
  "client_context": { "platform": "web" },
  "engine_context": {
    "engine": "v0",
    "version": "0.1",
    "config_hash": "..."
  },
  "message_id": "msg-001",
  "text": "I'm so frustrated, nothing is working",
  "normalized_text": "i'm so frustrated, nothing is working",
  "language_hint": "en",
  "input_modality": "text",
  "text_hash": "sha256-abc",
  "contains_sensitive_markers": false
}

3.6 Technical questions David might raise (and the answers ready)

David's framing of "are you sure you can handle this?" will probably come as specific technical questions. Drill these.

Q1

"Have you thought about how the safety pipeline integrates with the engine call?"

Answer: The safety pipeline runs BEFORE the engine, not after. Per §3.6.1, when a crisis keyword fires, the LLM is never called; we return a pre-approved safety template directly and record fallback_triggered=true with fallback_reason="safety_keyword". This is intentional non-ML design for liability. The pattern is: normalize text, scan keywords/regex, assign severity, force safety state and template, log the override.
Q2

"Do you understand how the engine interface stays stable across version swaps?"

Answer: §3.3.9 defines two stable function signatures: engine.analyze(canonical_input) -> analysis_output and engine.respond(canonical_input, analysis_output?) -> response_output. The FastAPI route calls these regardless of which engine is loaded. v0 implements them with a thin LLM wrapper. v2.0 (Phase 2) will implement them with a fine-tuned model. The route, event writer, demo app, and debugger all sit on the public side of this boundary and never change.
Q3

"How are you handling the event write guarantees? What happens if the DB is down?"

Answer: Per §3.3.7, if event persistence fails, the API call fails with 5xx (no silent success). The Event Writer is internal to the FastAPI service and runs synchronously on the request path. RabbitMQ is for non-critical async work, not for the analyze/respond write path. This means analyze/respond requests are atomic: either the event was written and the user got a response, or both failed.
Q4

"How does horizontal scaling work? Is the API stateless?"

Answer: Yes, per §3.7.7(c). Sessions persisted in DB (anonymized HMAC session_id), not in container memory. N copies of the FastAPI container can run behind an AWS load balancer; any container can handle any request for any session. /health and /ready endpoints let the load balancer decide routing. Each container loads its own engine at startup; no per-container state.
Q5

"What is your story on cache stability for demos vs cache-masked-bugs?"

Answer: Per §3.5.4(d), Engine v0 caches normalization_hash to output. Makes the demo deterministic (same prompt, same answer). Trade-off: bugs can be masked by stale cache. The Debugger surfaces cache_status: hit | miss in the trace; re-run in the Debugger always invokes the engine fresh. We can also disable the cache via config for test runs.
Q6

"How do you handle engine_version and engine_config_id mechanics?"

Answer: engine_version is the logical engine implementation ("v0", "v0.1", later "v2.0"). engine_config_id is a set of runtime params within that version (thresholds, prompt versions). Server has DEFAULT_ENGINE_VERSION env var. Internal tools can override per-request via X-Engine-Version header. Re-run in the Debugger uses stored engine_version + config_id, so old results stay reproducible even after production moves on.
Q7

"What about the data retention rules in §3.7.9?"

Answer: Three retention tiers. Raw message text retained 90 days post-session, then purged. Analysis events plus aggregated metrics retained 24 months. Crisis/safety-flagged interactions retained 12 months regardless of purge. Plus client deletion-cert rights (10 business days from request) and full data export rights (10 business days). Implementation: scheduled job runs nightly to purge expired raw text from event store.
Q8

"How does synthetic corpus generation work? What is the provenance trail?"

Answer: Per §3.2.6, SEUA generates synthetic messages via Azure OpenAI behind a replaceable adapter. Each generated message includes provenance metadata in the event: provider, model_id, prompt_version, simulator_preset, scenario_category, intensity setting, random seed, generation_config_id, created_at, reference_material_used flag. Corpus must include scenario blueprints, intended labels, coverage checks, duplicate/quality checks, crisis/safety-edge coverage, lightweight curation pass.
Q9

"How will you measure that the structured taxonomy beats simple sentiment?"

Answer: Per §3.11.1, two baselines run alongside EAII: (a) sentiment-only delta model like VADER, (b) LLM-based per-turn classifier without structured state memory. Both run on the same dataset and metrics as EAII. The point is to show simple sentiment misses the structure EAII captures: receptivity_signal, disengagement_flag, conversation_dynamics trajectory, escalation_flag. The eval is illustrative not academic; deliverable is investor-readable summary, not a benchmark paper.
Q10

"How does External-LLM-output scoring extend the schema without breaking /v1?"

Answer: Per §3.12, added through backward-compatible request/schema extensions. The actor.type field expands to {user, assistant, external_llm}. New event types (External-LLM Output Event, External-LLM Scoring Event) added without changing existing types. No breaking changes to /v1; existing clients keep working. Internal tools clearly distinguish the two streams in debugger, labeling, export views.
Q11

"What is your story on the controlled emotion / intent / response_strategy label sets?"

Answer: Those label sets are not yet defined as a team artifact. The SOW says they are "controlled labels" but does not list them. This is intentionally one of the first architecture sessions: lock the value sets together with the team, because downstream tooling (Dashboard distributions, drift detection, labeling UI) all assumes a closed set. We will lock them at M1 (architecture lock).
Q12

"How are you handling LLM provider isolation? What if Azure rate-limits you?"

Answer: Per §3.5.1(c), Engine v0 keeps the LLM provider behind a replaceable adapter. The AI gateway layer (LiteLLM or Portkey, both MIT OSS) sits between Engine v0 and the actual provider. So if Azure has issues, the gateway can retry with a different model or temporarily route to a different provider without code changes; we just update config. Decision between LiteLLM and Portkey deferred to implementation.
Q13

"What happens to a Message Event for an empty input or over-length input?"

Answer: Per §3.4.2(a), Message Events still emit for corner cases. Empty/whitespace input marked rejected_empty. Over-length text stores both the original text and truncated_text used for analysis, plus metadata. Unsupported language stored anyway, with analysis likely low-confidence. The principle: every user input produces at least a Message Event for traceability, regardless of whether analysis succeeds.
Q14

"How do you handle the case where Engine v0 returns malformed JSON?"

Answer: Per §3.5.4, Engine v0 enforces strict JSON schema validation on model output via Pydantic. If parse or schema validation fails, deterministic fallback fires: returns neutral baseline analysis with fallback_triggered=true, fallback_reason="parse_error". Analysis Event still emitted with status="failed" and safe error object. Response generation switches to neutral fallback. The system never crashes on a bad LLM output; it records the failure and serves a safe response.
Q15

"How does the Debugger reconstruct a multi-event session?"

Answer: Events use causal links in the envelope: message_event_id and analysis_event_id. So an Analysis Event references its Message Event; a Response Event references both Message and Analysis events; a Feedback Event references its Analysis and Response events. To reconstruct a session: pull all events with the same session_id, sort by occurred_at, chain by causal links. This is how the Debugger builds the full Message > Analysis > Response > Feedback timeline view.

3.7 Glossary

ADR Architectural Decision Record. One-page doc explaining one architectural choice.
Affective computing Academic field of computational emotion recognition. Picard, MIT, 1990s.
API gateway Service in front of your API handling auth, rate limiting, routing. Kong is one.
Beanie ODM Python library mapping classes to MongoDB documents.
Canonical event envelope Common metadata wrapper for all event types.
CloudWatch AWS observability service.
Containerization Packaging app plus dependencies into a portable Docker image.
Controlled vocabulary Fixed enum of allowed string values, NOT free text.
Crisis keyword override Deterministic safety pipeline triggers a safety template if input matches crisis patterns.
EAII Emotional Artificial Intelligence Infrastructure, the platform brand.
Engine v0 Placeholder thin-LLM-wrapper engine.
Engine v2.0 Future real engine (Phase 2). Replaces v0 across the engine interface boundary.
Engine interface boundary The engine.analyze() / engine.respond() function signatures. The swap point.
engine_version Which logical engine implementation (e.g., "v0").
engine_config_id Which runtime config within an engine version.
Event store Persistent storage for all canonical events.
External-LLM-output scoring Additive Phase 1 workstream evaluating assistant/model output behavior.
Fallback System failure leads to neutral baseline. Two flavors: low-confidence/parse-error vs safety-keyword.
FastAPI Python async web framework.
Feedback Event Captures user feedback on emotion-alignment plus response-helpfulness.
Golden set Curated input/expected-output pairs to detect drift.
HMAC Keyed hash function. Used to anonymize session_ids.
JSONB Postgres JSON column type, indexable and queryable.
JSON Schema validation Enforces JSON conforms to a declared schema.
Kong API Gateway Open-source API gateway.
Label Event Annotator's correction of an interaction.
LiteLLM Python AI gateway abstracting LLM providers.
LLM Large Language Model. GPT/Claude/Llama-class.
Message Event Raw user input, source-of-truth.
Microservice readiness Engine swappable plus horizontally scalable plus stateless.
MongoDB Document database.
Normalization Deterministic text cleanup before analysis.
normalization_hash Hash of normalized input.
OpenAPI spec Machine-readable API documentation format.
OpenTofu Apache-2.0 fork of Terraform.
PostgreSQL Relational database with JSONB column.
Portkey Alternative AI gateway.
Prometheus Metrics collection.
Pseudonymous Stable identifier but not user PII.
Pydantic Python schema validation library.
RabbitMQ Message queue.
RAG Retrieval-Augmented Generation. NOT used in Phase 1.
Rate limiting Restricts requests per API key.
React Frontend framework.
receptivity_signal How open the user is to engagement.
Redis In-memory key-value store.
Re-run Re-execute analysis with stored engine_version plus config; show diff.
Request ID / Trace ID UUIDs propagated through logs and events.
Response Event What the system returned plus which template/strategy.
response_mode Lower-level rendering tone (neutral/supportive/informational).
response_strategy High-level posture (validate/clarify/redirect/de-escalate).
Safety pipeline Deterministic non-ML crisis keyword override.
schema_version Identifies which event schema version produced this event.
SEUA Simulated Emotional User Agent. Synthetic message generator.
Shadow mode Deploy alongside existing system without affecting user-facing path.
Smoke test Quick post-deploy script verifying basic functionality.
SOW Statement of Work. This document.
Stateless service Sessions in DB, not memory.
Structured output LLM returns JSON conforming to schema.
Synthetic corpus Dataset generated by an LLM-driven simulator.
Tension Bodily/psychological tension; calm | tense.
Terraform IaC tool.
text_hash Hash of original input text. Used for dedup.
Trace fields model_id, prompt_version, normalization_hash, etc.
Valence positive | neutral | negative sign of an emotion.
Valkey Apache 2.0 Redis fork.
VADER Sentiment Simple lexicon-based sentiment analyzer.
X-Engine-Version header Internal-tools header to override default engine version per request.

Final Cheat Sheet (re-skim Sunday night)

  1. Phase 1 vs Phase 2: Phase 1 is the platform around the engine; Phase 2 is the engine itself. Engine interface boundary makes the swap clean.
  2. The 12 deliverables: Demo App, Internal Tools, Backend/API, Event Pipeline, Engine v0, Safety, Infra, Developer Interface, Docs, UI Pattern, Eval Framework, External-LLM Scoring.
  3. The 9 milestones: M1 (architecture lock) to M9 (handover). Roughly M3 to M5 right now.
  4. The architecture spine: Demo App, Kong, FastAPI (validate, normalize, safety, engine, event writer), Mongo+Redis+Rabbit, Internal Tools.
  5. The taxonomy: 30 fields in 10 layers. Affective (emotion, valence, intensity, tension), communicative need (intent, response_strategy), conversation_dynamics, 5 orthogonal flags, relational (receptivity, disengagement), operational (escalation, response_mode), confidence, explainability, trace, fallback.
  6. The 7 event types: Message, Analysis, Response, Feedback, Label, External-LLM Output, External-LLM Scoring. Linked by causal IDs.
  7. The taxonomy gap: emotion / intent / response_strategy label sets NOT YET DEFINED.
  8. The two endpoints: /v1/emotions/analyze (taxonomy only) and /v1/emotions/respond (analyze plus generate response).
  9. Safety pipeline runs BEFORE the engine. Crisis keywords bypass the LLM entirely.
  10. Stateless API service: sessions in DB, horizontal scaling, /health and /ready for load balancers.
  11. Two flavors of fallback: low_confidence / parse_error (engine called, returned neutral) vs safety_keyword (engine never called, template returned).
  12. engine_version vs engine_config_id: version is which engine implementation; config_id is which runtime config within. Both stored in trace for reproducibility.
  13. Caching: Engine v0 caches normalization_hash to output for demo stability. Re-run in Debugger always invokes fresh.
  14. The 15 likely David questions in §3.6 above. Drill those Sunday.

End of guide. Now go drill it Friday.