PRD: Architectural Plan Page Explanation & Educational Details

📋 Implementation Issue: Issue #258 - AI-Powered Plan Page Explanation with Agentic Workflow

Executive Summary

This PRD defines the requirements for enhancing architectural plan page presentation by generating AI-powered, professional plan explanation with educational markdown content that explains plan pages in depth. This feature adds a new "Details" tab to the page viewer with rich, expert-level explanations that make architectural drawings accessible to non-experts while providing enriched RAG artifacts for agentic systems.

Key Principle: Transform raw LLM-extracted text into comprehensive, professional architectural explanation that bridges the gap between expert architectural drawings and user understanding.

Phase 1 MVP: 3-tool agentic workflow (Generate → Assess with confidence → Refine) that delivers quality explanations while maintaining extensibility for future enhancements.

Framework: Google ADK Java with iterative workflow, confidence-based quality scoring, and complete trajectory observability.

Problem Statement

Current State

Current Plan Page UI:

┌───────────────────────────────────────┐
│  Plan Page Viewer                     │
│  ┌───────────────────────────────-──┐ │
│  │ [Overview] [Preview] [Compliance]│ │
│  └──────────────────────────────────┘ │
│                                       │
│  Overview Tab: Shows 1000-char        │
│  summary from page-summary-1000char   │
│  .json                                │
└───────────────────────────────────────┘

Current Page Artifacts:

projects/{projectId}/files/{file_id}/pages/{page_number}/
├── page.pdf                          # Raw PDF page (1 page extract)
├── page.md                            # LLM-extracted text (raw, looks OCR-like)
├── page-summary-1000char.json         # Brief summary for Overview tab
├── metadata.json                      # Page metadata
└── [compliance artifacts]             # Compliance reports

Problems with Current State:

page.md is Too Raw: LLM text extraction produces output that:
- Reads like a data dump, not a professional explanation
- Lacks context and expert explanations
- Doesn't explain architectural symbols, legends, or conventions
- Is not educational or beginner-friendly
- Misses relationships between elements (e.g., how zoning affects building height)
Limited Educational Value:
- Non-architects struggle to understand architectural drawings
- No explanations of technical terms (setbacks, FAR, lot coverage, etc.)
- No guidance on reading legends, symbols, or annotations
- Beginners can't learn from the content
- Even experienced professionals benefit from clear, digestible summaries that reduce cognitive load and accelerate comprehension, especially when reviewing unfamiliar project types or complex multi-disciplinary plans
Weak RAG Artifacts:
- Current page.md is not enriched with AI understanding
- Missing expanded context for agentic systems
- No semantic connections between page elements
- Limited value for retrieval-augmented generation
- Not optimized for hybrid search (semantic + keyword)
- Lacks narrative richness for vector embeddings
- Missing explicit keywords and domain terminology
No Search Optimization:
- Content not structured for semantic search
- Missing keyword enrichment for hybrid retrieval
- No consideration for future search/RAG features
- Sparse content reduces discoverability
No AI Quality Evaluation:
- Can't assess how well AI understands plan pages
- No mechanism to iterate and improve prompts
- Can't optimize agentic workflows for page explanation

User Impact

Beginners/Homeowners: Can't understand architectural drawings without expert help
Junior Architects: Need educational content to learn plan reading
Senior Architects: Benefit from rapid comprehension of unfamiliar project types without manual analysis
Plan Reviewers: Save time with pre-digested summaries instead of deciphering raw drawings
Non-Industry Experts: Overwhelmed by technical jargon and symbols
AI/Agentic Systems: Underperform due to weak input artifacts
Developers: Can't evaluate and optimize AI understanding of plan pages

Goals

Primary Goals

Educational Details Tab: Add a "Details" tab that displays rich, AI-generated professional explanation of the page
Generate Enhanced Markdown: Create page-explanation.md with:
- Beginner-friendly explanations of all page elements
- Context about architectural symbols and legends
- Relationships between elements (zoning → setbacks → building dimensions)
- Definitions of technical terms inline
- Visual descriptions enriched with understanding
- Narrative richness for semantic search
- Explicit keywords and domain terminology
Search-Optimized Content: Generate content optimized for hybrid search (semantic + keyword):
- Keyword enrichment: Explicit mentions of technical terms, synonyms, variations
- Semantic context: Expanded narrative that creates strong vector embeddings
- Domain terminology: Architecture, zoning, construction code terms used naturally
- Cross-references: Connections to related concepts (e.g., "lot coverage relates to setbacks and FAR")
- Question-answer patterns: Content structured to answer likely user queries
Agentic Generation Workflow: Use iterative AI workflow (ReAct loop or hardcoded) to refine content through multiple passes
Enriched RAG Artifacts: Provide high-quality content for downstream agentic tasks and future search features
Local Development Support: Enable rapid iteration using local projects without cloud deployments

Secondary Goals

AI Quality Evaluation: Assess AI's ability to understand architectural plans
Prompt Optimization: Use generated content to refine prompts and agent harness
Multi-Modal Input: Leverage both page.md text AND raw PDF with cached tokens
Future Image Integration: Foundation for mixing images (legends, symbols) into markdown

Non-Goals

Real-time generation (asynchronous batch processing is acceptable)
Editing markdown through UI (read-only display in Phase 1)
Generating understanding for compliance reports (focus on plan pages only)
Replacing existing Overview tab (add Details tab, keep Overview)
Supporting non-architectural document types initially

User Stories

Story 1: View Professional Interpretation Details Tab

As a project reviewer (beginner or non-expert)
I want to view professional architectural explanation of plan pages
So that I can understand what the page shows without expert architectural knowledge

Acceptance Criteria:

Page viewer has a new "Details" tab (fourth tab after Overview, Preview, Compliance)
Details tab displays content from page-explanation.md file
Content is rendered as rich markdown with:
- Headings, lists, tables
- Inline term definitions
- Explanations of architectural concepts
- Beginner-friendly language
Tab loads asynchronously if content is not yet generated
Tab shows "Generating professional explanation..." state during processing
Tab shows "Interpretation not available" if generation fails
Content is scrollable and formatted for readability

Story 2: Generate Enhanced Page Interpretation (Backend)

As a system
I want to automatically generate rich, professional explanation markdown for architectural plan pages
So that users can access expert explanations and agentic systems have enriched artifacts

Acceptance Criteria:

New background task: GeneratePageExplanation (triggered after page.md extraction)
Task inputs:
- page.md text content
- page.pdf raw file (use prompt caching for PDF images)
- metadata.json page metadata
Task outputs:
- page-explanation.md (rich, professional explanation)
- Updated metadata.json with generation timestamp and status
Task uses iterative agentic workflow:
- Iteration 1 (Turns 1-3): Generate → Reflect → Refine
- Iteration 2 (Turns 4-5): Reflect → Refine (if quality < threshold)
- Iteration N: Continue until quality threshold met or max_iterations reached
Task handles failures gracefully (retry logic, fallback to raw text)
Task logs AI interactions for prompt optimization

Story 3: Iterative Agentic Refinement with Confidence-Based Quality

As a developer
I want the system to use an iterative agentic workflow with confidence-based quality assessment
So that the generated markdown improves through self-assessment and I can trust the quality scores

Acceptance Criteria:

Phase 1 MVP - 3-Tool Workflow:

Tool 1: generateExplanation() - Creates comprehensive draft
Tool 2: assessQuality() - Returns {score, confidence, gaps}
Tool 3: refineExplanation() - Improves based on assessment

Iteration Structure:

Iteration 1 (always): Generate → Assess → Refine
Iteration 2+ (if needed): Assess → Refine (builds on previous)
Max 3 iterations or 15 total turns

Iteration 1 (Initial Generation):

Turn 1 (Generate):
- Input: page.md text + page.pdf (cached)
- Prompt: "Generate comprehensive, professional explanation of this architectural plan page"
- Output: page-explanation-draft-v1.md
Turn 2 (Reflect):
- Input: page-explanation-draft-v1.md
- Prompt: "Review this explanation. Identify gaps, unclear sections, missing context, or areas needing expansion."
- Output: Reflection notes (JSON with quality score)
Turn 3 (Refine):
- Input: page-explanation-draft-v1.md + reflection notes + original PDF (cached)
- Prompt: "Improve the explanation based on reflection. Expand gaps, clarify unclear sections, add missing context."
- Output: page-explanation-draft-v2.md

Iteration 2 (Optional - if quality score < threshold):

Turn 4 (Reflect): Review draft-v2
Turn 5 (Refine): Improve to draft-v3
Continue until quality threshold met or max_iterations reached

Tracking Requirements:

Each turn is logged with:
- Turn number (sequential across all iterations)
- Iteration number
- Turn type (Generate/Reflect/Refine)
- Prompt used
- Token usage (input/output/cached)
- Model used
- Timestamp
Final metadata includes:
- iterations_completed: Number of complete cycles
- total_turns: Total LLM API calls
Workflow is configurable (max_iterations, quality threshold, reflection prompts, etc.)

Story 4: Search-Optimized Content for Hybrid Retrieval

As a system preparing for future search/RAG features
I want to generate content optimized for hybrid search (semantic + keyword retrieval)
So that pages are highly discoverable through semantic search and agentic RAG experiences

Acceptance Criteria:

Keyword Enrichment:

Content explicitly mentions technical terms and their synonyms:
- Example: "Floor Area Ratio (FAR), also known as Floor Space Index (FSI)"
- Example: "Setback requirements (also called building lines or yard requirements)"
Terms repeated naturally throughout the narrative for keyword density
Both formal and informal terminology included:
- Formal: "Required front yard setback"
- Informal: "How far the building must be from the street"

Semantic Context Expansion:

Rich narrative that creates strong vector embeddings:
- "The 25-foot front setback creates a buffer zone between the street and building, allowing for landscaping and maintaining neighborhood character. This setback works in conjunction with the lot coverage limit of 35%, ensuring adequate open space."
Explanations of "why" and "how", not just "what":
- Not just: "Front setback: 25 feet"
- Instead: "The 25-foot front setback requirement stems from the R-1 zoning designation, which prioritizes low-density residential character with spacious front yards for privacy and aesthetics."

Cross-References & Relationships:

Explicit connections between related concepts:
- "Lot coverage (35%) relates to setbacks (25' front, 10' sides) and building height (35' max) to control overall building mass and density"
References to applicable codes:
- "This complies with IRC Section R302 for fire separation"
- "Follows Chapter 11A requirements for accessibility"

Question-Answer Patterns:

Content structured to answer likely queries:
- "What is the maximum building height?" → "The maximum building height is 35 feet..."
- "How much of the lot can be covered?" → "Lot coverage is limited to 35%, meaning..."
- "What are the setback requirements?" → "Setbacks are: 25' front, 10' sides, 15' rear..."

Domain Terminology Natural Usage:

Architecture terms: floor plan, elevation, section, detail, legend, scale
Zoning terms: setback, lot coverage, FAR, density, height limit, use restrictions
Construction terms: foundation, framing, sheathing, roof pitch, wall assembly
Code terms: compliance, requirement, exception, variance, amendment

Metadata Keywords (optional structured data):

{
  "extractedConcepts": [
    "floor plan",
    "R-1 zoning",
    "setbacks",
    "lot coverage",
    "building height"
  ],
  "applicableCodes": [
    "IRC Section R302",
    "Chapter 11A"
  ],
  "relatedPages": [2, 5, 7]
}

Story 5: Local Development Support

As a developer
I want to process architectural plan pages locally (using local project folders)
So that I can iterate quickly without deploying to Cloud Run or waiting for Cloud Run Jobs

Acceptance Criteria:

CLI tool: generate-page-explanation (local execution)
Tool accepts:
- --project-path: Path to local project folder (e.g., projects/R2024.0091-2024-10-14)
- --file-id: File ID to process (optional, defaults to all files)
- --page-numbers: Page numbers to process (optional, defaults to all pages)
- --force: Regenerate even if page-explanation.md exists
- --verbose: Show detailed logs including prompts and responses
Tool reads from local filesystem:
- projects/{projectId}/files/{file_id}/pages/{page_number}/page.md
- projects/{projectId}/files/{file_id}/pages/{page_number}/page.pdf
Tool writes to local filesystem:
- projects/{projectId}/files/{file_id}/pages/{page_number}/page-explanation.md
- Updates metadata.json with generation status
Tool uses same agentic workflow as backend (code reuse)
Tool outputs progress and summary:
- Pages processed
- Pages succeeded/failed
- Total tokens used
- Total time taken

Story 5: Multi-File Structure Support

As a developer
I want to convert a legacy project to multi-file structure and generate page understanding
So that I can test on realistic project data with proper file organization

Acceptance Criteria:

CLI tool: upgrade-project-and-generate (combined workflow)
Tool accepts:
- --source-project: Path to legacy project (e.g., projects/R2024.0091-2024-10-14)
- --target-project: Path for upgraded copy (e.g., projects/R2024.0091-test-copy)
Tool workflow:
1. Copy project to new location
2. Upgrade to multi-file structure (create files/ directory)
3. Generate file metadata (files/{file_id}/metadata.json)
4. Generate page explanation for all pages
Tool outputs:
- Summary of upgrade (files created, pages migrated)
- Summary of explanation generation (pages processed, tokens used)
Idempotent: Can be run multiple times safely

Technical Design (High-Level)

File Naming Options

Three candidate names for the enhanced markdown file:

page-explanation.md ⭐ RECOMMENDED
- Professional: "explanation" is architectural terminology
- Expert framing: suggests professional analysis and judgment
- Active vs passive: more active than "understanding"
- Field alignment: aligns with architectural/code explanation practices
- Parallel to existing page.md (raw) vs page-explanation.md (expert)
page-understanding.md
- Clear but more passive
- Emphasizes comprehension over expertise
page-details.md
- Simple and clear
- Matches "Details" tab name
- May be too generic

Decision: Use page-explanation.md for professional framing and field alignment.

Updated Page Directory Structure

projects/{projectId}/files/{file_id}/pages/{page_number}/
├── page.pdf                          # Raw PDF page (existing)
├── page.md                            # LLM-extracted text (existing, looks OCR-like)
├── page-explanation.md             # NEW: Rich, professional explanation
├── page-summary-1000char.json         # Brief summary (existing)
├── metadata.json                      # Page metadata (updated with generation status)
└── [compliance artifacts]             # Compliance reports (existing)

Metadata Updates:

{
  "pageNumber": 3,
  "fileId": "1",
  "explanation": {
    "status": "completed",
    "generatedAt": "2024-10-20T01:30:00Z",
    "primaryModel": "gemini-2.5-pro-latest",
    "modelsUsed": {
      "gemini-2.5-pro-latest": 2,
      "gemini-2.0-flash-exp": 1
    },
    "iterationsCompleted": 1,
    "totalTurns": 3,
    "filePath": "page-explanation.md",
    "costAnalysis": {
      "totalTokens": 17800,
      "estimatedTotalCostUsd": 0.101,
      "tokenBreakdown": {
        "nonCachedInput": {"tokenCount": 7800, "costUsd": 0.023},
        "cachedContent": {"tokenCount": 10000, "costUsd": 0.003, "discountPercent": 90},
        "output": {"tokenCount": 5000, "costUsd": 0.075}
      },
      "processingMetadata": {
        "durationMs": 45000,
        "cachingEfficiencyPercent": 56.2
      }
    }
  }
}

Proto Schema Changes

Import Existing Cost Analysis Proto:

import "cost_analysis.proto";

New RPC: GeneratePageExplanation

// Request to generate AI-powered professional explanation for a plan page
message GeneratePageExplanationRequest {
  string project_id = 1;
  string file_id = 2;
  int32 page_number = 3;
  
  // Processing options
  bool force_regenerate = 4;            // Regenerate even if already exists
  int32 max_iterations = 5;             // Max loop iterations (default: 1 = single Generate→Reflect→Refine pass)
  bool verbose_logging = 6;             // Log prompts and responses
}

message GeneratePageExplanationResponse {
  bool success = 1;
  string status_message = 2;
  
  PageExplanationMetadata metadata = 3;
  
  // Performance metrics (uses existing CostAnalysisMetadata)
  CostAnalysisMetadata cost_analysis = 4;
  int32 processing_time_seconds = 5;
}

message PageExplanationMetadata {
  string status = 1;                    // "pending", "processing", "completed", "failed"
  google.protobuf.Timestamp generated_at = 2;
  
  // Model tracking (multi-model support)
  string primary_model = 3;             // Main model used (e.g., "gemini-2.5-pro-latest")
  map<string, int32> models_used = 4;   // Model → turn count (e.g., {"gemini-2.5-pro": 2, "gemini-flash": 1})
  
  // Workflow tracking
  int32 iterations_completed = 5;       // Number of complete loop cycles (e.g., 2 = ran Generate→Reflect→Refine twice)
  int32 total_turns = 6;                // Total LLM API calls (e.g., 6 = 2 iterations × 3 turns each)
  
  // Output
  string file_path = 7;                 // Relative path: "page-explanation.md"
  
  // Cost analysis (reuses existing CostAnalysisMetadata)
  // Note: MetaCostAnalysis supports per-model cost breakdown for multi-model workflows
  CostAnalysisMetadata cost_analysis = 8;
}

Note: We're reusing the existing CostAnalysisMetadata message from cost_analysis.proto which provides comprehensive token tracking including:

Total tokens and estimated cost
Detailed breakdown (non-cached input, cached content, output, thinking, tool use)
Rate per million tokens
Discount percentages for cached content
Processing metadata (duration, caching efficiency)

This is far superior to creating a simple TokenUsage message and ensures consistency with existing task cost tracking (Issue #176).

Update Existing RPC: GetArchitecturalPlanPage

message ArchitecturalPlanPage {
  // ... existing fields ...
  
  // NEW: Rich professional explanation
  string explanation_markdown = 20;   // Content from page-explanation.md
  PageExplanationMetadata explanation_metadata = 21;
}

Component Architecture

Backend Services:

PageExplanationService: Generate and manage page explanation
- generatePageExplanation(projectId, fileId, pageNumber, options)
- getPageExplanation(projectId, fileId, pageNumber)
- regeneratePageExplanation(...) (force regeneration)

Backend Agentic Workflow:

AgenticPageInterpreter: Multi-turn AI workflow
- explainPage(pageContext) → Returns final markdown
- Internal methods:
  - generateInitialDraft(pageContext) - Uses primary model (quality-critical)
  - reflectOnDraft(draft) - Uses reflection model (analytical, can be cheaper)
  - refineWithReflection(draft, reflection, pageContext) - Uses primary model (quality-critical)
- Configurable:
  - Model selection strategy:
    - primaryModel: For generation/refinement (e.g., Gemini 2.5 Pro)
    - reflectionModel: For analysis/scoring (e.g., Gemini Flash)
    - orchestrationModel: For workflow decisions (e.g., Gemini Flash)
  - Max iterations
  - Quality threshold for stopping
  - Reflection prompts
  - Temperature: 0.0 for all turns (maximum predictability)

Frontend Components:

Update PageViewerComponent:
- Add fourth tab: "Details"
- Lazy-load page-understanding.md content when tab is clicked
- Show loading/error states
DetailsTabComponent (new):
- Render markdown content
- Handle empty/loading/error states
- Responsive layout

CLI Tools:

generate-page-explanation.sh: Local generation script
upgrade-project-and-generate.sh: Legacy upgrade + generation

Environment Configuration

No new environment variables required (uses existing Vertex AI credentials for Gemini models).

Optional Configuration (if needed):

# In .env.{environment}
PAGE_EXPLANATION_PRIMARY_MODEL="gemini-2.5-pro-latest"    # Primary model for generation/refinement
PAGE_EXPLANATION_REFLECTION_MODEL="gemini-2.0-flash-exp"  # Efficient model for reflection
PAGE_EXPLANATION_MAX_ITERATIONS="1"                       # Max workflow iterations (default: 1)
PAGE_EXPLANATION_ENABLE_CACHE="true"                      # Use prompt caching for PDFs

Cost Analysis

Assumptions:

Average page: ~1-2 pages of architectural drawing
page.md: ~2,000 tokens
page.pdf: ~5,000 tokens (image)
Model Strategy (Phase 1): Single model for all turns
- Primary: gemini-2.5-pro-latest
Model Strategy (Phase 2): Multi-model optimization
- Generation/Refinement: gemini-2.5-pro-latest (quality-critical)
- Reflection: gemini-2.0-flash-exp (cost-effective, 10x cheaper)
- Orchestration: gemini-2.0-flash-exp (fast decisions)
Workflow: 1 iteration (3 turns: Generate→Reflect→Refine)

Token Usage Per Page (1 Iteration = 3 Turns):

Iteration 1:
  Turn 1 (Generate):
    Input: 5,000 (PDF, cached) + 2,000 (text) + 500 (prompt) = 7,500 tokens
    Output: ~2,000 tokens
  
  Turn 2 (Reflect):
    Input: 2,000 (draft) + 300 (prompt) = 2,300 tokens
    Output: ~500 tokens (reflection notes)
  
  Turn 3 (Refine):
    Input: 5,000 (PDF, cached) + 2,000 (draft) + 500 (reflection) + 500 (prompt) = 8,000 tokens
    Output: ~2,500 tokens

Total per page (1 iteration):
  Input: 17,800 tokens (10,000 cached)
  Output: 5,000 tokens
  Iterations: 1
  Total turns: 3
  
If 2 iterations needed (quality < threshold):
  Total turns: 5 (Turn 1-3 + Turn 4-5 for Reflect→Refine)
  Additional cost: ~$0.04

Cost Per Page - Single Model (Phase 1 - Gemini 2.5 Pro):

Cached input: 10,000 tokens × $0.315 / 1M = $0.003
Regular input: 7,800 tokens × $1.25 / 1M = $0.010
Output: 5,000 tokens × $5.00 / 1M = $0.025
Total: ~$0.04 per page

Cost Per Page - Multi-Model (Phase 2 - Gemini Pro + Flash):

Turn 1 (Generate - Gemini 2.5 Pro):
  Input: 7,500 tokens × $1.25/1M = $0.009
  Cached: 0
  Output: 2,000 tokens × $5.00/1M = $0.010
  Subtotal: $0.019

Turn 2 (Reflect - Gemini Flash):
  Input: 2,300 tokens × $0.075/1M = $0.0002
  Output: 500 tokens × $0.30/1M = $0.0002
  Subtotal: $0.0004

Turn 3 (Refine - Gemini 2.5 Pro):
  Input: 3,000 tokens × $1.25/1M = $0.004
  Cached: 5,000 tokens × $0.315/1M = $0.002
  Output: 2,500 tokens × $5.00/1M = $0.013
  Subtotal: $0.019

Total: ~$0.038 → ~$0.02 per page (50% savings vs single-model!)

Project Cost (100-page project):

Single-model (Gemini Pro): $0.04 × 100 = $4.00
Multi-model (Pro + Flash): $0.02 × 100 = $2.00 (50% savings!)
Extremely cost-effective compared to other premium models

Model Selection Strategy:

Quality-Critical Tasks (Generation, Refinement):
- Use premium models (Gemini 2.5 Pro)
- Creative, nuanced writing required
- Gemini 2.5 Pro: Excellent quality at great cost/performance ratio (~$0.02/turn)
- Cost: Higher, but essential for quality
Analytical Tasks (Reflection, Quality Scoring):
- Use efficient models (Gemini Flash)
- Structured analysis, less creativity needed
- Gemini Flash: Extremely cost-effective (~$0.0004/turn)
- Cost: 50-100x cheaper than premium models
Orchestration/Decision (Workflow control, simple logic):
- Use fastest models (Gemini Flash)
- Binary decisions, routing logic
- Cost: Negligible impact (~$0.0001/turn)

Optimization Opportunities:

Prompt caching reduces cost by 75% for repeated PDF tokens (Gemini)
Batch processing multiple pages with same PDF (file-level caching)
Dynamic model selection based on page complexity
Use Gemini Flash for simple pages, Gemini Pro for complex ones
Gemini Flash for all reflection turns (50-100x cheaper than Pro)

Testing Strategy

Unit Tests:

PageUnderstandingService logic
Agentic workflow orchestration
Markdown generation and validation

Integration Tests:

End-to-end page understanding generation
Multi-turn workflow correctness
Prompt caching effectiveness
Local CLI tool execution

Manual Testing Checklist:

Generate understanding for sample pages
Review markdown quality and completeness
Test Details tab in UI
Verify prompt caching reduces cost
Test local development workflow

Evaluation Metrics:

Readability: Is the markdown easy to understand for beginners?
Completeness: Does it explain all page elements?
Accuracy: Are technical terms and concepts correct?
Educational Value: Can a non-expert learn from it?
Iteration Value: Does reflection improve quality vs single-pass?

Observability and Debugging

Agent Trajectory Tracking

Requirement: Capture complete execution trace of agentic workflows for debugging, optimization, and quality assessment.

What to Track:

All LLM Calls (turns):
- Model name, prompt, response
- Token usage, cost
- Latency
- Cache hits
Tool Invocations:
- Tool name, inputs, outputs
- Execution time
- Success/failure status
- Nested LLM calls (if tool uses LLM internally)
Iterations:
- Iteration number
- Draft version (v1, v2, v3...)
- Quality score per iteration
- Quality improvement trajectory
Decisions:
- Continue iterating? Why or why not?
- Quality threshold checks
- Max iterations reached

Export Format: JSON

{
  "trajectory_id": "traj_abc123",
  "workflow_name": "page_explanation",
  "project_id": "R2024.0091",
  "file_id": "1",
  "page_number": 3,
  "started_at": "2024-10-20T02:30:00Z",
  "completed_at": "2024-10-20T02:31:45Z",
  "total_duration_ms": 105000,
  "iterations": [
    {
      "iteration_number": 1,
      "turns": [
        {
          "turn_number": 1,
          "turn_type": "LLM_CALL",
          "phase_name": "GENERATE",
          "model_name": "gemini-2.5-pro-latest",
          "input_tokens": 7500,
          "output_tokens": 2000,
          "cost_usd": 0.019,
          "prompt_template": "generate.txt",
          "response_summary": "Generated 1200-word explanation..."
        },
        {
          "turn_number": 2,
          "turn_type": "LLM_CALL",
          "phase_name": "REFLECT",
          "model_name": "gemini-2.0-flash-exp",
          "input_tokens": 2300,
          "output_tokens": 500,
          "cost_usd": 0.0004,
          "quality_score": 0.75,
          "gaps_identified": ["Missing AHO-1 explanation", "Define FAR"]
        },
        {
          "turn_number": 3,
          "turn_type": "LLM_CALL",
          "phase_name": "REFINE",
          "model_name": "gemini-2.5-pro-latest",
          "cached_tokens": 5000,
          "cost_usd": 0.019
        }
      ],
      "quality_score": 0.85,
      "quality_threshold_met": true
    }
  ],
  "outcome": {
    "success": true,
    "status": "completed",
    "final_quality_score": 0.85,
    "stop_reason": "QUALITY_MET"
  },
  "total_turns": 3,
  "cost_analysis": {
    "total_cost_usd": 0.038,
    "models_used": {
      "gemini-2.5-pro-latest": 2,
      "gemini-2.0-flash-exp": 1
    }
  }
}

Storage:

BigQuery: Individual LlmTrace records (existing)
GCS: Complete AgentTrajectory JSON files
Firestore: Trajectory metadata for listing/searching

Access:

API: ExportAgentTrajectory(trajectory_id) → JSON
CLI: export-trajectory --trajectory-id=traj_123 --output=trace.json
UI (Future): Visual timeline of agent thought process

Use Cases:

Debugging: "Why did the agent make 5 turns instead of 3?"
Optimization: "Which prompts cause quality issues?"
Cost Analysis: "Where are we spending tokens?"
Quality: "How much does reflection improve quality?"
Education: "Show me how the AI thinks through a complex page"

Success Metrics

User Engagement

% of users who view Details tab (target: >50%)
Time spent on Details tab vs Overview tab
User feedback on educational value

AI Quality

Human evaluation scores (readability, completeness, accuracy)
Iteration improvement: quality gain per iteration
Average turns needed to reach quality threshold
Error rate (pages failing to generate)

System Performance

Average generation time per page (target: less than 2 minutes)
Token cost per page (target: less than $0.15)
Cache hit rate for PDFs (target: greater than 80%)

Developer Productivity

Time to iterate on prompts locally (target: less than 5 minutes per test)
Prompt optimization success rate

Risks and Mitigations

Risk	Impact	Likelihood	Mitigation
High generation cost	High	Medium	Use prompt caching aggressively, batch processing, cheaper models for reflection
Slow generation (user wait time)	Medium	High	Asynchronous background processing, show loading states, prioritize recently viewed pages
Poor markdown quality	High	Medium	Multi-turn refinement, extensive prompt engineering, human evaluation loop
Prompt caching not effective	High	Low	Validate caching behavior, ensure PDF reuse across turns
Local CLI performance	Low	Low	Use same cloud APIs, optimize file I/O
Content not educational enough	High	Medium	Extensive prompt tuning, include example outputs in prompt, human feedback

Future Enhancements

Phase 2: Image Integration

Embed extracted legend images in markdown
Show symbol definitions with visual examples
Extract and annotate key details from PDF

Phase 3: Interactive Elements

Collapsible sections for advanced details
Inline term glossary with popups
Cross-references to related pages

Phase 4: Multi-Lingual Support

Generate understanding in multiple languages
Auto-detect user language preference

Phase 5: Custom Generation

User-adjustable detail level (beginner, intermediate, expert)
Focus areas (zoning only, structural only, etc.)
Export to PDF or Word

Phase 6: Collaborative Annotations

Users can add comments to Details tab
Share annotations with team members

Open Questions

File Naming: page-explanation.md vs page-understanding.md vs page-details.md?
- Answer: page-explanation.md (professional framing, field alignment)
Generation Trigger: Automatic after page.md extraction or on-demand?
- Answer: Hybrid - automatic for new uploads, on-demand regeneration via API/CLI
Iteration Count: Fixed 1 iteration or dynamic based on quality score?
- Answer: Start with fixed 1 iteration (3 turns), add quality-based iteration in Phase 2
Model Selection: Single model or multi-model strategy?
- Answer:
  - Phase 1: Single model (Gemini 2.5 Pro) for simplicity
  - Phase 2: Multi-model optimization:
    - Generation/Refinement turns: Gemini 2.5 Pro (quality-critical, best cost/quality)
    - Reflection turns: Gemini Flash (cost-effective, 50-100x cheaper)
    - Orchestration logic: Gemini Flash (fast decision-making)
- Cost savings: ~50% reduction by using Flash for reflection
- Quality impact: Minimal (reflection is analytical, not creative)
Legacy Projects: Auto-generate explanation for all pages or user-opt-in?
- Answer: User-opt-in via CLI tool (avoid surprise costs)
Caching Strategy: Per-page or per-file PDF caching?
- Answer: Per-page initially (simpler), explore file-level batching in Phase 2

File Structure Reorganization PRD: Multi-file structure foundation
Developer Playbook: Build and deployment workflows
PRD/TDD Workflow: Feature development process
Background Tasks Architecture: Async task processing

Executive Summary​

Problem Statement​

Current State​

User Impact​

Goals​

Primary Goals​

Secondary Goals​

Non-Goals​

User Stories​

Story 1: View Professional Interpretation Details Tab​

Story 2: Generate Enhanced Page Interpretation (Backend)​

Story 3: Iterative Agentic Refinement with Confidence-Based Quality​

Story 4: Search-Optimized Content for Hybrid Retrieval​

Story 5: Local Development Support​

Story 5: Multi-File Structure Support​

Technical Design (High-Level)​

File Naming Options​

Updated Page Directory Structure​

Proto Schema Changes​

Component Architecture​

Environment Configuration​

Cost Analysis​

Testing Strategy​

Observability and Debugging​

Agent Trajectory Tracking​

Success Metrics​

User Engagement​

AI Quality​

System Performance​

Developer Productivity​

Risks and Mitigations​

Future Enhancements​

Phase 2: Image Integration​

Phase 3: Interactive Elements​

Phase 4: Multi-Lingual Support​

Phase 5: Custom Generation​

Phase 6: Collaborative Annotations​

Open Questions​

Related Documentation​

References​

Executive Summary

Problem Statement

Current State

User Impact

Goals

Primary Goals

Secondary Goals

Non-Goals

User Stories

Story 1: View Professional Interpretation Details Tab

Story 2: Generate Enhanced Page Interpretation (Backend)

Story 3: Iterative Agentic Refinement with Confidence-Based Quality

Story 4: Search-Optimized Content for Hybrid Retrieval

Story 5: Local Development Support

Story 5: Multi-File Structure Support

Technical Design (High-Level)

File Naming Options

Updated Page Directory Structure

Proto Schema Changes

Component Architecture

Environment Configuration

Cost Analysis

Testing Strategy

Observability and Debugging

Agent Trajectory Tracking

Success Metrics

User Engagement

AI Quality

System Performance

Developer Productivity

Risks and Mitigations

Future Enhancements

Phase 2: Image Integration

Phase 3: Interactive Elements

Phase 4: Multi-Lingual Support

Phase 5: Custom Generation

Phase 6: Collaborative Annotations

Open Questions

Related Documentation

References