Skip to main content

PRD: Architectural Plan Page Explanation & Educational Details

πŸ“‹ Implementation Issue: Issue #258 - AI-Powered Plan Page Explanation with Agentic Workflow

Executive Summary​

This PRD defines the requirements for enhancing architectural plan page presentation by generating AI-powered, professional plan explanation with educational markdown content that explains plan pages in depth. This feature adds a new "Details" tab to the page viewer with rich, expert-level explanations that make architectural drawings accessible to non-experts while providing enriched RAG artifacts for agentic systems.

Key Principle: Transform raw LLM-extracted text into comprehensive, professional architectural explanation that bridges the gap between expert architectural drawings and user understanding.

Phase 1 MVP: 3-tool agentic workflow (Generate β†’ Assess with confidence β†’ Refine) that delivers quality explanations while maintaining extensibility for future enhancements.

Framework: Google ADK Java with iterative workflow, confidence-based quality scoring, and complete trajectory observability.

Problem Statement​

Current State​

Current Plan Page UI:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Plan Page Viewer β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-──┐ β”‚
β”‚ β”‚ [Overview] [Preview] [Compliance]β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ Overview Tab: Shows 1000-char β”‚
β”‚ summary from page-summary-1000char β”‚
β”‚ .json β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Current Page Artifacts:

projects/{projectId}/files/{file_id}/pages/{page_number}/
β”œβ”€β”€ page.pdf # Raw PDF page (1 page extract)
β”œβ”€β”€ page.md # LLM-extracted text (raw, looks OCR-like)
β”œβ”€β”€ page-summary-1000char.json # Brief summary for Overview tab
β”œβ”€β”€ metadata.json # Page metadata
└── [compliance artifacts] # Compliance reports

Problems with Current State:

  1. page.md is Too Raw: LLM text extraction produces output that:

    • Reads like a data dump, not a professional explanation
    • Lacks context and expert explanations
    • Doesn't explain architectural symbols, legends, or conventions
    • Is not educational or beginner-friendly
    • Misses relationships between elements (e.g., how zoning affects building height)
  2. Limited Educational Value:

    • Non-architects struggle to understand architectural drawings
    • No explanations of technical terms (setbacks, FAR, lot coverage, etc.)
    • No guidance on reading legends, symbols, or annotations
    • Beginners can't learn from the content
    • Even experienced professionals benefit from clear, digestible summaries that reduce cognitive load and accelerate comprehension, especially when reviewing unfamiliar project types or complex multi-disciplinary plans
  3. Weak RAG Artifacts:

    • Current page.md is not enriched with AI understanding
    • Missing expanded context for agentic systems
    • No semantic connections between page elements
    • Limited value for retrieval-augmented generation
    • Not optimized for hybrid search (semantic + keyword)
    • Lacks narrative richness for vector embeddings
    • Missing explicit keywords and domain terminology
  4. No Search Optimization:

    • Content not structured for semantic search
    • Missing keyword enrichment for hybrid retrieval
    • No consideration for future search/RAG features
    • Sparse content reduces discoverability
  5. No AI Quality Evaluation:

    • Can't assess how well AI understands plan pages
    • No mechanism to iterate and improve prompts
    • Can't optimize agentic workflows for page explanation

User Impact​

  • Beginners/Homeowners: Can't understand architectural drawings without expert help
  • Junior Architects: Need educational content to learn plan reading
  • Senior Architects: Benefit from rapid comprehension of unfamiliar project types without manual analysis
  • Plan Reviewers: Save time with pre-digested summaries instead of deciphering raw drawings
  • Non-Industry Experts: Overwhelmed by technical jargon and symbols
  • AI/Agentic Systems: Underperform due to weak input artifacts
  • Developers: Can't evaluate and optimize AI understanding of plan pages

Goals​

Primary Goals​

  1. Educational Details Tab: Add a "Details" tab that displays rich, AI-generated professional explanation of the page
  2. Generate Enhanced Markdown: Create page-explanation.md with:
    • Beginner-friendly explanations of all page elements
    • Context about architectural symbols and legends
    • Relationships between elements (zoning β†’ setbacks β†’ building dimensions)
    • Definitions of technical terms inline
    • Visual descriptions enriched with understanding
    • Narrative richness for semantic search
    • Explicit keywords and domain terminology
  3. Search-Optimized Content: Generate content optimized for hybrid search (semantic + keyword):
    • Keyword enrichment: Explicit mentions of technical terms, synonyms, variations
    • Semantic context: Expanded narrative that creates strong vector embeddings
    • Domain terminology: Architecture, zoning, construction code terms used naturally
    • Cross-references: Connections to related concepts (e.g., "lot coverage relates to setbacks and FAR")
    • Question-answer patterns: Content structured to answer likely user queries
  4. Agentic Generation Workflow: Use iterative AI workflow (ReAct loop or hardcoded) to refine content through multiple passes
  5. Enriched RAG Artifacts: Provide high-quality content for downstream agentic tasks and future search features
  6. Local Development Support: Enable rapid iteration using local projects without cloud deployments

Secondary Goals​

  1. AI Quality Evaluation: Assess AI's ability to understand architectural plans
  2. Prompt Optimization: Use generated content to refine prompts and agent harness
  3. Multi-Modal Input: Leverage both page.md text AND raw PDF with cached tokens
  4. Future Image Integration: Foundation for mixing images (legends, symbols) into markdown

Non-Goals​

  • Real-time generation (asynchronous batch processing is acceptable)
  • Editing markdown through UI (read-only display in Phase 1)
  • Generating understanding for compliance reports (focus on plan pages only)
  • Replacing existing Overview tab (add Details tab, keep Overview)
  • Supporting non-architectural document types initially

User Stories​

Story 1: View Professional Interpretation Details Tab​

As a project reviewer (beginner or non-expert)
I want to view professional architectural explanation of plan pages
So that I can understand what the page shows without expert architectural knowledge

Acceptance Criteria:

  • Page viewer has a new "Details" tab (fourth tab after Overview, Preview, Compliance)
  • Details tab displays content from page-explanation.md file
  • Content is rendered as rich markdown with:
    • Headings, lists, tables
    • Inline term definitions
    • Explanations of architectural concepts
    • Beginner-friendly language
  • Tab loads asynchronously if content is not yet generated
  • Tab shows "Generating professional explanation..." state during processing
  • Tab shows "Interpretation not available" if generation fails
  • Content is scrollable and formatted for readability

Story 2: Generate Enhanced Page Interpretation (Backend)​

As a system
I want to automatically generate rich, professional explanation markdown for architectural plan pages
So that users can access expert explanations and agentic systems have enriched artifacts

Acceptance Criteria:

  • New background task: GeneratePageExplanation (triggered after page.md extraction)
  • Task inputs:
    • page.md text content
    • page.pdf raw file (use prompt caching for PDF images)
    • metadata.json page metadata
  • Task outputs:
    • page-explanation.md (rich, professional explanation)
    • Updated metadata.json with generation timestamp and status
  • Task uses iterative agentic workflow:
    • Iteration 1 (Turns 1-3): Generate β†’ Reflect β†’ Refine
    • Iteration 2 (Turns 4-5): Reflect β†’ Refine (if quality < threshold)
    • Iteration N: Continue until quality threshold met or max_iterations reached
  • Task handles failures gracefully (retry logic, fallback to raw text)
  • Task logs AI interactions for prompt optimization

Story 3: Iterative Agentic Refinement with Confidence-Based Quality​

As a developer
I want the system to use an iterative agentic workflow with confidence-based quality assessment
So that the generated markdown improves through self-assessment and I can trust the quality scores

Acceptance Criteria:

Phase 1 MVP - 3-Tool Workflow:

  • Tool 1: generateExplanation() - Creates comprehensive draft
  • Tool 2: assessQuality() - Returns {score, confidence, gaps}
  • Tool 3: refineExplanation() - Improves based on assessment

Iteration Structure:

  • Iteration 1 (always): Generate β†’ Assess β†’ Refine
  • Iteration 2+ (if needed): Assess β†’ Refine (builds on previous)
  • Max 3 iterations or 15 total turns

Iteration 1 (Initial Generation):

  • Turn 1 (Generate):
    • Input: page.md text + page.pdf (cached)
    • Prompt: "Generate comprehensive, professional explanation of this architectural plan page"
    • Output: page-explanation-draft-v1.md
  • Turn 2 (Reflect):
    • Input: page-explanation-draft-v1.md
    • Prompt: "Review this explanation. Identify gaps, unclear sections, missing context, or areas needing expansion."
    • Output: Reflection notes (JSON with quality score)
  • Turn 3 (Refine):
    • Input: page-explanation-draft-v1.md + reflection notes + original PDF (cached)
    • Prompt: "Improve the explanation based on reflection. Expand gaps, clarify unclear sections, add missing context."
    • Output: page-explanation-draft-v2.md

Iteration 2 (Optional - if quality score < threshold):

  • Turn 4 (Reflect): Review draft-v2
  • Turn 5 (Refine): Improve to draft-v3
  • Continue until quality threshold met or max_iterations reached

Tracking Requirements:

  • Each turn is logged with:
    • Turn number (sequential across all iterations)
    • Iteration number
    • Turn type (Generate/Reflect/Refine)
    • Prompt used
    • Token usage (input/output/cached)
    • Model used
    • Timestamp
  • Final metadata includes:
    • iterations_completed: Number of complete cycles
    • total_turns: Total LLM API calls
  • Workflow is configurable (max_iterations, quality threshold, reflection prompts, etc.)

Story 4: Search-Optimized Content for Hybrid Retrieval​

As a system preparing for future search/RAG features
I want to generate content optimized for hybrid search (semantic + keyword retrieval)
So that pages are highly discoverable through semantic search and agentic RAG experiences

Acceptance Criteria:

Keyword Enrichment:

  • Content explicitly mentions technical terms and their synonyms:
    • Example: "Floor Area Ratio (FAR), also known as Floor Space Index (FSI)"
    • Example: "Setback requirements (also called building lines or yard requirements)"
  • Terms repeated naturally throughout the narrative for keyword density
  • Both formal and informal terminology included:
    • Formal: "Required front yard setback"
    • Informal: "How far the building must be from the street"

Semantic Context Expansion:

  • Rich narrative that creates strong vector embeddings:
    • "The 25-foot front setback creates a buffer zone between the street and building, allowing for landscaping and maintaining neighborhood character. This setback works in conjunction with the lot coverage limit of 35%, ensuring adequate open space."
  • Explanations of "why" and "how", not just "what":
    • Not just: "Front setback: 25 feet"
    • Instead: "The 25-foot front setback requirement stems from the R-1 zoning designation, which prioritizes low-density residential character with spacious front yards for privacy and aesthetics."

Cross-References & Relationships:

  • Explicit connections between related concepts:
    • "Lot coverage (35%) relates to setbacks (25' front, 10' sides) and building height (35' max) to control overall building mass and density"
  • References to applicable codes:
    • "This complies with IRC Section R302 for fire separation"
    • "Follows Chapter 11A requirements for accessibility"

Question-Answer Patterns:

  • Content structured to answer likely queries:
    • "What is the maximum building height?" β†’ "The maximum building height is 35 feet..."
    • "How much of the lot can be covered?" β†’ "Lot coverage is limited to 35%, meaning..."
    • "What are the setback requirements?" β†’ "Setbacks are: 25' front, 10' sides, 15' rear..."

Domain Terminology Natural Usage:

  • Architecture terms: floor plan, elevation, section, detail, legend, scale
  • Zoning terms: setback, lot coverage, FAR, density, height limit, use restrictions
  • Construction terms: foundation, framing, sheathing, roof pitch, wall assembly
  • Code terms: compliance, requirement, exception, variance, amendment

Metadata Keywords (optional structured data):

{
"extractedConcepts": [
"floor plan",
"R-1 zoning",
"setbacks",
"lot coverage",
"building height"
],
"applicableCodes": [
"IRC Section R302",
"Chapter 11A"
],
"relatedPages": [2, 5, 7]
}

Story 5: Local Development Support​

As a developer
I want to process architectural plan pages locally (using local project folders)
So that I can iterate quickly without deploying to Cloud Run or waiting for Cloud Run Jobs

Acceptance Criteria:

  • CLI tool: generate-page-explanation (local execution)
  • Tool accepts:
    • --project-path: Path to local project folder (e.g., projects/R2024.0091-2024-10-14)
    • --file-id: File ID to process (optional, defaults to all files)
    • --page-numbers: Page numbers to process (optional, defaults to all pages)
    • --force: Regenerate even if page-explanation.md exists
    • --verbose: Show detailed logs including prompts and responses
  • Tool reads from local filesystem:
    • projects/{projectId}/files/{file_id}/pages/{page_number}/page.md
    • projects/{projectId}/files/{file_id}/pages/{page_number}/page.pdf
  • Tool writes to local filesystem:
    • projects/{projectId}/files/{file_id}/pages/{page_number}/page-explanation.md
    • Updates metadata.json with generation status
  • Tool uses same agentic workflow as backend (code reuse)
  • Tool outputs progress and summary:
    • Pages processed
    • Pages succeeded/failed
    • Total tokens used
    • Total time taken

Story 5: Multi-File Structure Support​

As a developer
I want to convert a legacy project to multi-file structure and generate page understanding
So that I can test on realistic project data with proper file organization

Acceptance Criteria:

  • CLI tool: upgrade-project-and-generate (combined workflow)
  • Tool accepts:
    • --source-project: Path to legacy project (e.g., projects/R2024.0091-2024-10-14)
    • --target-project: Path for upgraded copy (e.g., projects/R2024.0091-test-copy)
  • Tool workflow:
    1. Copy project to new location
    2. Upgrade to multi-file structure (create files/ directory)
    3. Generate file metadata (files/{file_id}/metadata.json)
    4. Generate page explanation for all pages
  • Tool outputs:
    • Summary of upgrade (files created, pages migrated)
    • Summary of explanation generation (pages processed, tokens used)
  • Idempotent: Can be run multiple times safely

Technical Design (High-Level)​

File Naming Options​

Three candidate names for the enhanced markdown file:

  1. page-explanation.md ⭐ RECOMMENDED

    • Professional: "explanation" is architectural terminology
    • Expert framing: suggests professional analysis and judgment
    • Active vs passive: more active than "understanding"
    • Field alignment: aligns with architectural/code explanation practices
    • Parallel to existing page.md (raw) vs page-explanation.md (expert)
  2. page-understanding.md

    • Clear but more passive
    • Emphasizes comprehension over expertise
  3. page-details.md

    • Simple and clear
    • Matches "Details" tab name
    • May be too generic

Decision: Use page-explanation.md for professional framing and field alignment.

Updated Page Directory Structure​

projects/{projectId}/files/{file_id}/pages/{page_number}/
β”œβ”€β”€ page.pdf # Raw PDF page (existing)
β”œβ”€β”€ page.md # LLM-extracted text (existing, looks OCR-like)
β”œβ”€β”€ page-explanation.md # NEW: Rich, professional explanation
β”œβ”€β”€ page-summary-1000char.json # Brief summary (existing)
β”œβ”€β”€ metadata.json # Page metadata (updated with generation status)
└── [compliance artifacts] # Compliance reports (existing)

Metadata Updates:

{
"pageNumber": 3,
"fileId": "1",
"explanation": {
"status": "completed",
"generatedAt": "2024-10-20T01:30:00Z",
"primaryModel": "gemini-2.5-pro-latest",
"modelsUsed": {
"gemini-2.5-pro-latest": 2,
"gemini-2.0-flash-exp": 1
},
"iterationsCompleted": 1,
"totalTurns": 3,
"filePath": "page-explanation.md",
"costAnalysis": {
"totalTokens": 17800,
"estimatedTotalCostUsd": 0.101,
"tokenBreakdown": {
"nonCachedInput": {"tokenCount": 7800, "costUsd": 0.023},
"cachedContent": {"tokenCount": 10000, "costUsd": 0.003, "discountPercent": 90},
"output": {"tokenCount": 5000, "costUsd": 0.075}
},
"processingMetadata": {
"durationMs": 45000,
"cachingEfficiencyPercent": 56.2
}
}
}
}

Proto Schema Changes​

Import Existing Cost Analysis Proto:

import "cost_analysis.proto";

New RPC: GeneratePageExplanation

// Request to generate AI-powered professional explanation for a plan page
message GeneratePageExplanationRequest {
string project_id = 1;
string file_id = 2;
int32 page_number = 3;

// Processing options
bool force_regenerate = 4; // Regenerate even if already exists
int32 max_iterations = 5; // Max loop iterations (default: 1 = single Generate→Reflect→Refine pass)
bool verbose_logging = 6; // Log prompts and responses
}

message GeneratePageExplanationResponse {
bool success = 1;
string status_message = 2;

PageExplanationMetadata metadata = 3;

// Performance metrics (uses existing CostAnalysisMetadata)
CostAnalysisMetadata cost_analysis = 4;
int32 processing_time_seconds = 5;
}

message PageExplanationMetadata {
string status = 1; // "pending", "processing", "completed", "failed"
google.protobuf.Timestamp generated_at = 2;

// Model tracking (multi-model support)
string primary_model = 3; // Main model used (e.g., "gemini-2.5-pro-latest")
map<string, int32> models_used = 4; // Model β†’ turn count (e.g., {"gemini-2.5-pro": 2, "gemini-flash": 1})

// Workflow tracking
int32 iterations_completed = 5; // Number of complete loop cycles (e.g., 2 = ran Generate→Reflect→Refine twice)
int32 total_turns = 6; // Total LLM API calls (e.g., 6 = 2 iterations Γ— 3 turns each)

// Output
string file_path = 7; // Relative path: "page-explanation.md"

// Cost analysis (reuses existing CostAnalysisMetadata)
// Note: MetaCostAnalysis supports per-model cost breakdown for multi-model workflows
CostAnalysisMetadata cost_analysis = 8;
}

Note: We're reusing the existing CostAnalysisMetadata message from cost_analysis.proto which provides comprehensive token tracking including:

  • Total tokens and estimated cost
  • Detailed breakdown (non-cached input, cached content, output, thinking, tool use)
  • Rate per million tokens
  • Discount percentages for cached content
  • Processing metadata (duration, caching efficiency)

This is far superior to creating a simple TokenUsage message and ensures consistency with existing task cost tracking (Issue #176).

Update Existing RPC: GetArchitecturalPlanPage

message ArchitecturalPlanPage {
// ... existing fields ...

// NEW: Rich professional explanation
string explanation_markdown = 20; // Content from page-explanation.md
PageExplanationMetadata explanation_metadata = 21;
}

Component Architecture​

Backend Services:

  • PageExplanationService: Generate and manage page explanation
    • generatePageExplanation(projectId, fileId, pageNumber, options)
    • getPageExplanation(projectId, fileId, pageNumber)
    • regeneratePageExplanation(...) (force regeneration)

Backend Agentic Workflow:

  • AgenticPageInterpreter: Multi-turn AI workflow
    • explainPage(pageContext) β†’ Returns final markdown
    • Internal methods:
      • generateInitialDraft(pageContext) - Uses primary model (quality-critical)
      • reflectOnDraft(draft) - Uses reflection model (analytical, can be cheaper)
      • refineWithReflection(draft, reflection, pageContext) - Uses primary model (quality-critical)
    • Configurable:
      • Model selection strategy:
        • primaryModel: For generation/refinement (e.g., Gemini 2.5 Pro)
        • reflectionModel: For analysis/scoring (e.g., Gemini Flash)
        • orchestrationModel: For workflow decisions (e.g., Gemini Flash)
      • Max iterations
      • Quality threshold for stopping
      • Reflection prompts
      • Temperature: 0.0 for all turns (maximum predictability)

Frontend Components:

  • Update PageViewerComponent:
    • Add fourth tab: "Details"
    • Lazy-load page-understanding.md content when tab is clicked
    • Show loading/error states
  • DetailsTabComponent (new):
    • Render markdown content
    • Handle empty/loading/error states
    • Responsive layout

CLI Tools:

  • generate-page-explanation.sh: Local generation script
  • upgrade-project-and-generate.sh: Legacy upgrade + generation

Environment Configuration​

No new environment variables required (uses existing Vertex AI credentials for Gemini models).

Optional Configuration (if needed):

# In .env.{environment}
PAGE_EXPLANATION_PRIMARY_MODEL="gemini-2.5-pro-latest" # Primary model for generation/refinement
PAGE_EXPLANATION_REFLECTION_MODEL="gemini-2.0-flash-exp" # Efficient model for reflection
PAGE_EXPLANATION_MAX_ITERATIONS="1" # Max workflow iterations (default: 1)
PAGE_EXPLANATION_ENABLE_CACHE="true" # Use prompt caching for PDFs

Cost Analysis​

Assumptions:

  • Average page: ~1-2 pages of architectural drawing
  • page.md: ~2,000 tokens
  • page.pdf: ~5,000 tokens (image)
  • Model Strategy (Phase 1): Single model for all turns
    • Primary: gemini-2.5-pro-latest
  • Model Strategy (Phase 2): Multi-model optimization
    • Generation/Refinement: gemini-2.5-pro-latest (quality-critical)
    • Reflection: gemini-2.0-flash-exp (cost-effective, 10x cheaper)
    • Orchestration: gemini-2.0-flash-exp (fast decisions)
  • Workflow: 1 iteration (3 turns: Generateβ†’Reflectβ†’Refine)

Token Usage Per Page (1 Iteration = 3 Turns):

Iteration 1:
Turn 1 (Generate):
Input: 5,000 (PDF, cached) + 2,000 (text) + 500 (prompt) = 7,500 tokens
Output: ~2,000 tokens

Turn 2 (Reflect):
Input: 2,000 (draft) + 300 (prompt) = 2,300 tokens
Output: ~500 tokens (reflection notes)

Turn 3 (Refine):
Input: 5,000 (PDF, cached) + 2,000 (draft) + 500 (reflection) + 500 (prompt) = 8,000 tokens
Output: ~2,500 tokens

Total per page (1 iteration):
Input: 17,800 tokens (10,000 cached)
Output: 5,000 tokens
Iterations: 1
Total turns: 3

If 2 iterations needed (quality < threshold):
Total turns: 5 (Turn 1-3 + Turn 4-5 for Reflect→Refine)
Additional cost: ~$0.04

Cost Per Page - Single Model (Phase 1 - Gemini 2.5 Pro):

  • Cached input: 10,000 tokens Γ— $0.315 / 1M = $0.003
  • Regular input: 7,800 tokens Γ— $1.25 / 1M = $0.010
  • Output: 5,000 tokens Γ— $5.00 / 1M = $0.025
  • Total: ~$0.04 per page

Cost Per Page - Multi-Model (Phase 2 - Gemini Pro + Flash):

Turn 1 (Generate - Gemini 2.5 Pro):
Input: 7,500 tokens Γ— $1.25/1M = $0.009
Cached: 0
Output: 2,000 tokens Γ— $5.00/1M = $0.010
Subtotal: $0.019

Turn 2 (Reflect - Gemini Flash):
Input: 2,300 tokens Γ— $0.075/1M = $0.0002
Output: 500 tokens Γ— $0.30/1M = $0.0002
Subtotal: $0.0004

Turn 3 (Refine - Gemini 2.5 Pro):
Input: 3,000 tokens Γ— $1.25/1M = $0.004
Cached: 5,000 tokens Γ— $0.315/1M = $0.002
Output: 2,500 tokens Γ— $5.00/1M = $0.013
Subtotal: $0.019

Total: ~$0.038 β†’ ~$0.02 per page (50% savings vs single-model!)

Project Cost (100-page project):

  • Single-model (Gemini Pro): $0.04 Γ— 100 = $4.00
  • Multi-model (Pro + Flash): $0.02 Γ— 100 = $2.00 (50% savings!)
  • Extremely cost-effective compared to other premium models

Model Selection Strategy:

  1. Quality-Critical Tasks (Generation, Refinement):

    • Use premium models (Gemini 2.5 Pro)
    • Creative, nuanced writing required
    • Gemini 2.5 Pro: Excellent quality at great cost/performance ratio (~$0.02/turn)
    • Cost: Higher, but essential for quality
  2. Analytical Tasks (Reflection, Quality Scoring):

    • Use efficient models (Gemini Flash)
    • Structured analysis, less creativity needed
    • Gemini Flash: Extremely cost-effective (~$0.0004/turn)
    • Cost: 50-100x cheaper than premium models
  3. Orchestration/Decision (Workflow control, simple logic):

    • Use fastest models (Gemini Flash)
    • Binary decisions, routing logic
    • Cost: Negligible impact (~$0.0001/turn)

Optimization Opportunities:

  • Prompt caching reduces cost by 75% for repeated PDF tokens (Gemini)
  • Batch processing multiple pages with same PDF (file-level caching)
  • Dynamic model selection based on page complexity
  • Use Gemini Flash for simple pages, Gemini Pro for complex ones
  • Gemini Flash for all reflection turns (50-100x cheaper than Pro)

Testing Strategy​

Unit Tests:

  • PageUnderstandingService logic
  • Agentic workflow orchestration
  • Markdown generation and validation

Integration Tests:

  • End-to-end page understanding generation
  • Multi-turn workflow correctness
  • Prompt caching effectiveness
  • Local CLI tool execution

Manual Testing Checklist:

  • Generate understanding for sample pages
  • Review markdown quality and completeness
  • Test Details tab in UI
  • Verify prompt caching reduces cost
  • Test local development workflow

Evaluation Metrics:

  • Readability: Is the markdown easy to understand for beginners?
  • Completeness: Does it explain all page elements?
  • Accuracy: Are technical terms and concepts correct?
  • Educational Value: Can a non-expert learn from it?
  • Iteration Value: Does reflection improve quality vs single-pass?

Observability and Debugging​

Agent Trajectory Tracking​

Requirement: Capture complete execution trace of agentic workflows for debugging, optimization, and quality assessment.

What to Track:

  1. All LLM Calls (turns):

    • Model name, prompt, response
    • Token usage, cost
    • Latency
    • Cache hits
  2. Tool Invocations:

    • Tool name, inputs, outputs
    • Execution time
    • Success/failure status
    • Nested LLM calls (if tool uses LLM internally)
  3. Iterations:

    • Iteration number
    • Draft version (v1, v2, v3...)
    • Quality score per iteration
    • Quality improvement trajectory
  4. Decisions:

    • Continue iterating? Why or why not?
    • Quality threshold checks
    • Max iterations reached

Export Format: JSON

{
"trajectory_id": "traj_abc123",
"workflow_name": "page_explanation",
"project_id": "R2024.0091",
"file_id": "1",
"page_number": 3,
"started_at": "2024-10-20T02:30:00Z",
"completed_at": "2024-10-20T02:31:45Z",
"total_duration_ms": 105000,
"iterations": [
{
"iteration_number": 1,
"turns": [
{
"turn_number": 1,
"turn_type": "LLM_CALL",
"phase_name": "GENERATE",
"model_name": "gemini-2.5-pro-latest",
"input_tokens": 7500,
"output_tokens": 2000,
"cost_usd": 0.019,
"prompt_template": "generate.txt",
"response_summary": "Generated 1200-word explanation..."
},
{
"turn_number": 2,
"turn_type": "LLM_CALL",
"phase_name": "REFLECT",
"model_name": "gemini-2.0-flash-exp",
"input_tokens": 2300,
"output_tokens": 500,
"cost_usd": 0.0004,
"quality_score": 0.75,
"gaps_identified": ["Missing AHO-1 explanation", "Define FAR"]
},
{
"turn_number": 3,
"turn_type": "LLM_CALL",
"phase_name": "REFINE",
"model_name": "gemini-2.5-pro-latest",
"cached_tokens": 5000,
"cost_usd": 0.019
}
],
"quality_score": 0.85,
"quality_threshold_met": true
}
],
"outcome": {
"success": true,
"status": "completed",
"final_quality_score": 0.85,
"stop_reason": "QUALITY_MET"
},
"total_turns": 3,
"cost_analysis": {
"total_cost_usd": 0.038,
"models_used": {
"gemini-2.5-pro-latest": 2,
"gemini-2.0-flash-exp": 1
}
}
}

Storage:

  • BigQuery: Individual LlmTrace records (existing)
  • GCS: Complete AgentTrajectory JSON files
  • Firestore: Trajectory metadata for listing/searching

Access:

  • API: ExportAgentTrajectory(trajectory_id) β†’ JSON
  • CLI: export-trajectory --trajectory-id=traj_123 --output=trace.json
  • UI (Future): Visual timeline of agent thought process

Use Cases:

  1. Debugging: "Why did the agent make 5 turns instead of 3?"
  2. Optimization: "Which prompts cause quality issues?"
  3. Cost Analysis: "Where are we spending tokens?"
  4. Quality: "How much does reflection improve quality?"
  5. Education: "Show me how the AI thinks through a complex page"

Success Metrics​

User Engagement​

  • % of users who view Details tab (target: >50%)
  • Time spent on Details tab vs Overview tab
  • User feedback on educational value

AI Quality​

  • Human evaluation scores (readability, completeness, accuracy)
  • Iteration improvement: quality gain per iteration
  • Average turns needed to reach quality threshold
  • Error rate (pages failing to generate)

System Performance​

  • Average generation time per page (target: less than 2 minutes)
  • Token cost per page (target: less than $0.15)
  • Cache hit rate for PDFs (target: greater than 80%)

Developer Productivity​

  • Time to iterate on prompts locally (target: less than 5 minutes per test)
  • Prompt optimization success rate

Risks and Mitigations​

RiskImpactLikelihoodMitigation
High generation costHighMediumUse prompt caching aggressively, batch processing, cheaper models for reflection
Slow generation (user wait time)MediumHighAsynchronous background processing, show loading states, prioritize recently viewed pages
Poor markdown qualityHighMediumMulti-turn refinement, extensive prompt engineering, human evaluation loop
Prompt caching not effectiveHighLowValidate caching behavior, ensure PDF reuse across turns
Local CLI performanceLowLowUse same cloud APIs, optimize file I/O
Content not educational enoughHighMediumExtensive prompt tuning, include example outputs in prompt, human feedback

Future Enhancements​

Phase 2: Image Integration​

  • Embed extracted legend images in markdown
  • Show symbol definitions with visual examples
  • Extract and annotate key details from PDF

Phase 3: Interactive Elements​

  • Collapsible sections for advanced details
  • Inline term glossary with popups
  • Cross-references to related pages

Phase 4: Multi-Lingual Support​

  • Generate understanding in multiple languages
  • Auto-detect user language preference

Phase 5: Custom Generation​

  • User-adjustable detail level (beginner, intermediate, expert)
  • Focus areas (zoning only, structural only, etc.)
  • Export to PDF or Word

Phase 6: Collaborative Annotations​

  • Users can add comments to Details tab
  • Share annotations with team members

Open Questions​

  1. File Naming: page-explanation.md vs page-understanding.md vs page-details.md?

    • Answer: page-explanation.md (professional framing, field alignment)
  2. Generation Trigger: Automatic after page.md extraction or on-demand?

    • Answer: Hybrid - automatic for new uploads, on-demand regeneration via API/CLI
  3. Iteration Count: Fixed 1 iteration or dynamic based on quality score?

    • Answer: Start with fixed 1 iteration (3 turns), add quality-based iteration in Phase 2
  4. Model Selection: Single model or multi-model strategy?

    • Answer:
      • Phase 1: Single model (Gemini 2.5 Pro) for simplicity
      • Phase 2: Multi-model optimization:
        • Generation/Refinement turns: Gemini 2.5 Pro (quality-critical, best cost/quality)
        • Reflection turns: Gemini Flash (cost-effective, 50-100x cheaper)
        • Orchestration logic: Gemini Flash (fast decision-making)
    • Cost savings: ~50% reduction by using Flash for reflection
    • Quality impact: Minimal (reflection is analytical, not creative)
  5. Legacy Projects: Auto-generate explanation for all pages or user-opt-in?

    • Answer: User-opt-in via CLI tool (avoid surprise costs)
  6. Caching Strategy: Per-page or per-file PDF caching?

    • Answer: Per-page initially (simpler), explore file-level batching in Phase 2

References​