PRD: Architectural Plan Page Explanation & Educational Details
π Implementation Issue: Issue #258 - AI-Powered Plan Page Explanation with Agentic Workflow
Executive Summaryβ
This PRD defines the requirements for enhancing architectural plan page presentation by generating AI-powered, professional plan explanation with educational markdown content that explains plan pages in depth. This feature adds a new "Details" tab to the page viewer with rich, expert-level explanations that make architectural drawings accessible to non-experts while providing enriched RAG artifacts for agentic systems.
Key Principle: Transform raw LLM-extracted text into comprehensive, professional architectural explanation that bridges the gap between expert architectural drawings and user understanding.
Phase 1 MVP: 3-tool agentic workflow (Generate β Assess with confidence β Refine) that delivers quality explanations while maintaining extensibility for future enhancements.
Framework: Google ADK Java with iterative workflow, confidence-based quality scoring, and complete trajectory observability.
Problem Statementβ
Current Stateβ
Current Plan Page UI:
βββββββββββββββββββββββββββββββββββββββββ
β Plan Page Viewer β
β ββββββββββββββββββββββββββββββββ-βββ β
β β [Overview] [Preview] [Compliance]β β
β ββββββββββββββββββββββββββββββββββββ β
β β
β Overview Tab: Shows 1000-char β
β summary from page-summary-1000char β
β .json β
βββββββββββββββββββββββββββββββββββββββββ
Current Page Artifacts:
projects/{projectId}/files/{file_id}/pages/{page_number}/
βββ page.pdf # Raw PDF page (1 page extract)
βββ page.md # LLM-extracted text (raw, looks OCR-like)
βββ page-summary-1000char.json # Brief summary for Overview tab
βββ metadata.json # Page metadata
βββ [compliance artifacts] # Compliance reports
Problems with Current State:
-
page.mdis Too Raw: LLM text extraction produces output that:- Reads like a data dump, not a professional explanation
- Lacks context and expert explanations
- Doesn't explain architectural symbols, legends, or conventions
- Is not educational or beginner-friendly
- Misses relationships between elements (e.g., how zoning affects building height)
-
Limited Educational Value:
- Non-architects struggle to understand architectural drawings
- No explanations of technical terms (setbacks, FAR, lot coverage, etc.)
- No guidance on reading legends, symbols, or annotations
- Beginners can't learn from the content
- Even experienced professionals benefit from clear, digestible summaries that reduce cognitive load and accelerate comprehension, especially when reviewing unfamiliar project types or complex multi-disciplinary plans
-
Weak RAG Artifacts:
- Current
page.mdis not enriched with AI understanding - Missing expanded context for agentic systems
- No semantic connections between page elements
- Limited value for retrieval-augmented generation
- Not optimized for hybrid search (semantic + keyword)
- Lacks narrative richness for vector embeddings
- Missing explicit keywords and domain terminology
- Current
-
No Search Optimization:
- Content not structured for semantic search
- Missing keyword enrichment for hybrid retrieval
- No consideration for future search/RAG features
- Sparse content reduces discoverability
-
No AI Quality Evaluation:
- Can't assess how well AI understands plan pages
- No mechanism to iterate and improve prompts
- Can't optimize agentic workflows for page explanation
User Impactβ
- Beginners/Homeowners: Can't understand architectural drawings without expert help
- Junior Architects: Need educational content to learn plan reading
- Senior Architects: Benefit from rapid comprehension of unfamiliar project types without manual analysis
- Plan Reviewers: Save time with pre-digested summaries instead of deciphering raw drawings
- Non-Industry Experts: Overwhelmed by technical jargon and symbols
- AI/Agentic Systems: Underperform due to weak input artifacts
- Developers: Can't evaluate and optimize AI understanding of plan pages
Goalsβ
Primary Goalsβ
- Educational Details Tab: Add a "Details" tab that displays rich, AI-generated professional explanation of the page
- Generate Enhanced Markdown: Create
page-explanation.mdwith:- Beginner-friendly explanations of all page elements
- Context about architectural symbols and legends
- Relationships between elements (zoning β setbacks β building dimensions)
- Definitions of technical terms inline
- Visual descriptions enriched with understanding
- Narrative richness for semantic search
- Explicit keywords and domain terminology
- Search-Optimized Content: Generate content optimized for hybrid search (semantic + keyword):
- Keyword enrichment: Explicit mentions of technical terms, synonyms, variations
- Semantic context: Expanded narrative that creates strong vector embeddings
- Domain terminology: Architecture, zoning, construction code terms used naturally
- Cross-references: Connections to related concepts (e.g., "lot coverage relates to setbacks and FAR")
- Question-answer patterns: Content structured to answer likely user queries
- Agentic Generation Workflow: Use iterative AI workflow (ReAct loop or hardcoded) to refine content through multiple passes
- Enriched RAG Artifacts: Provide high-quality content for downstream agentic tasks and future search features
- Local Development Support: Enable rapid iteration using local projects without cloud deployments
Secondary Goalsβ
- AI Quality Evaluation: Assess AI's ability to understand architectural plans
- Prompt Optimization: Use generated content to refine prompts and agent harness
- Multi-Modal Input: Leverage both
page.mdtext AND raw PDF with cached tokens - Future Image Integration: Foundation for mixing images (legends, symbols) into markdown
Non-Goalsβ
- Real-time generation (asynchronous batch processing is acceptable)
- Editing markdown through UI (read-only display in Phase 1)
- Generating understanding for compliance reports (focus on plan pages only)
- Replacing existing Overview tab (add Details tab, keep Overview)
- Supporting non-architectural document types initially
User Storiesβ
Story 1: View Professional Interpretation Details Tabβ
As a project reviewer (beginner or non-expert)
I want to view professional architectural explanation of plan pages
So that I can understand what the page shows without expert architectural knowledge
Acceptance Criteria:
- Page viewer has a new "Details" tab (fourth tab after Overview, Preview, Compliance)
- Details tab displays content from
page-explanation.mdfile - Content is rendered as rich markdown with:
- Headings, lists, tables
- Inline term definitions
- Explanations of architectural concepts
- Beginner-friendly language
- Tab loads asynchronously if content is not yet generated
- Tab shows "Generating professional explanation..." state during processing
- Tab shows "Interpretation not available" if generation fails
- Content is scrollable and formatted for readability
Story 2: Generate Enhanced Page Interpretation (Backend)β
As a system
I want to automatically generate rich, professional explanation markdown for architectural plan pages
So that users can access expert explanations and agentic systems have enriched artifacts
Acceptance Criteria:
- New background task:
GeneratePageExplanation(triggered afterpage.mdextraction) - Task inputs:
page.mdtext contentpage.pdfraw file (use prompt caching for PDF images)metadata.jsonpage metadata
- Task outputs:
page-explanation.md(rich, professional explanation)- Updated
metadata.jsonwith generation timestamp and status
- Task uses iterative agentic workflow:
- Iteration 1 (Turns 1-3): Generate β Reflect β Refine
- Iteration 2 (Turns 4-5): Reflect β Refine (if quality < threshold)
- Iteration N: Continue until quality threshold met or max_iterations reached
- Task handles failures gracefully (retry logic, fallback to raw text)
- Task logs AI interactions for prompt optimization
Story 3: Iterative Agentic Refinement with Confidence-Based Qualityβ
As a developer
I want the system to use an iterative agentic workflow with confidence-based quality assessment
So that the generated markdown improves through self-assessment and I can trust the quality scores
Acceptance Criteria:
Phase 1 MVP - 3-Tool Workflow:
- Tool 1:
generateExplanation()- Creates comprehensive draft - Tool 2:
assessQuality()- Returns{score, confidence, gaps} - Tool 3:
refineExplanation()- Improves based on assessment
Iteration Structure:
- Iteration 1 (always): Generate β Assess β Refine
- Iteration 2+ (if needed): Assess β Refine (builds on previous)
- Max 3 iterations or 15 total turns
Iteration 1 (Initial Generation):
- Turn 1 (Generate):
- Input:
page.mdtext +page.pdf(cached) - Prompt: "Generate comprehensive, professional explanation of this architectural plan page"
- Output:
page-explanation-draft-v1.md
- Input:
- Turn 2 (Reflect):
- Input:
page-explanation-draft-v1.md - Prompt: "Review this explanation. Identify gaps, unclear sections, missing context, or areas needing expansion."
- Output: Reflection notes (JSON with quality score)
- Input:
- Turn 3 (Refine):
- Input:
page-explanation-draft-v1.md+ reflection notes + original PDF (cached) - Prompt: "Improve the explanation based on reflection. Expand gaps, clarify unclear sections, add missing context."
- Output:
page-explanation-draft-v2.md
- Input:
Iteration 2 (Optional - if quality score < threshold):
- Turn 4 (Reflect): Review draft-v2
- Turn 5 (Refine): Improve to draft-v3
- Continue until quality threshold met or max_iterations reached
Tracking Requirements:
- Each turn is logged with:
- Turn number (sequential across all iterations)
- Iteration number
- Turn type (Generate/Reflect/Refine)
- Prompt used
- Token usage (input/output/cached)
- Model used
- Timestamp
- Final metadata includes:
iterations_completed: Number of complete cyclestotal_turns: Total LLM API calls
- Workflow is configurable (max_iterations, quality threshold, reflection prompts, etc.)
Story 4: Search-Optimized Content for Hybrid Retrievalβ
As a system preparing for future search/RAG features
I want to generate content optimized for hybrid search (semantic + keyword retrieval)
So that pages are highly discoverable through semantic search and agentic RAG experiences
Acceptance Criteria:
Keyword Enrichment:
- Content explicitly mentions technical terms and their synonyms:
- Example: "Floor Area Ratio (FAR), also known as Floor Space Index (FSI)"
- Example: "Setback requirements (also called building lines or yard requirements)"
- Terms repeated naturally throughout the narrative for keyword density
- Both formal and informal terminology included:
- Formal: "Required front yard setback"
- Informal: "How far the building must be from the street"
Semantic Context Expansion:
- Rich narrative that creates strong vector embeddings:
- "The 25-foot front setback creates a buffer zone between the street and building, allowing for landscaping and maintaining neighborhood character. This setback works in conjunction with the lot coverage limit of 35%, ensuring adequate open space."
- Explanations of "why" and "how", not just "what":
- Not just: "Front setback: 25 feet"
- Instead: "The 25-foot front setback requirement stems from the R-1 zoning designation, which prioritizes low-density residential character with spacious front yards for privacy and aesthetics."
Cross-References & Relationships:
- Explicit connections between related concepts:
- "Lot coverage (35%) relates to setbacks (25' front, 10' sides) and building height (35' max) to control overall building mass and density"
- References to applicable codes:
- "This complies with IRC Section R302 for fire separation"
- "Follows Chapter 11A requirements for accessibility"
Question-Answer Patterns:
- Content structured to answer likely queries:
- "What is the maximum building height?" β "The maximum building height is 35 feet..."
- "How much of the lot can be covered?" β "Lot coverage is limited to 35%, meaning..."
- "What are the setback requirements?" β "Setbacks are: 25' front, 10' sides, 15' rear..."
Domain Terminology Natural Usage:
- Architecture terms: floor plan, elevation, section, detail, legend, scale
- Zoning terms: setback, lot coverage, FAR, density, height limit, use restrictions
- Construction terms: foundation, framing, sheathing, roof pitch, wall assembly
- Code terms: compliance, requirement, exception, variance, amendment
Metadata Keywords (optional structured data):
{
"extractedConcepts": [
"floor plan",
"R-1 zoning",
"setbacks",
"lot coverage",
"building height"
],
"applicableCodes": [
"IRC Section R302",
"Chapter 11A"
],
"relatedPages": [2, 5, 7]
}
Story 5: Local Development Supportβ
As a developer
I want to process architectural plan pages locally (using local project folders)
So that I can iterate quickly without deploying to Cloud Run or waiting for Cloud Run Jobs
Acceptance Criteria:
- CLI tool:
generate-page-explanation(local execution) - Tool accepts:
--project-path: Path to local project folder (e.g.,projects/R2024.0091-2024-10-14)--file-id: File ID to process (optional, defaults to all files)--page-numbers: Page numbers to process (optional, defaults to all pages)--force: Regenerate even ifpage-explanation.mdexists--verbose: Show detailed logs including prompts and responses
- Tool reads from local filesystem:
projects/{projectId}/files/{file_id}/pages/{page_number}/page.mdprojects/{projectId}/files/{file_id}/pages/{page_number}/page.pdf
- Tool writes to local filesystem:
projects/{projectId}/files/{file_id}/pages/{page_number}/page-explanation.md- Updates
metadata.jsonwith generation status
- Tool uses same agentic workflow as backend (code reuse)
- Tool outputs progress and summary:
- Pages processed
- Pages succeeded/failed
- Total tokens used
- Total time taken
Story 5: Multi-File Structure Supportβ
As a developer
I want to convert a legacy project to multi-file structure and generate page understanding
So that I can test on realistic project data with proper file organization
Acceptance Criteria:
- CLI tool:
upgrade-project-and-generate(combined workflow) - Tool accepts:
--source-project: Path to legacy project (e.g.,projects/R2024.0091-2024-10-14)--target-project: Path for upgraded copy (e.g.,projects/R2024.0091-test-copy)
- Tool workflow:
- Copy project to new location
- Upgrade to multi-file structure (create
files/directory) - Generate file metadata (
files/{file_id}/metadata.json) - Generate page explanation for all pages
- Tool outputs:
- Summary of upgrade (files created, pages migrated)
- Summary of explanation generation (pages processed, tokens used)
- Idempotent: Can be run multiple times safely
Technical Design (High-Level)β
File Naming Optionsβ
Three candidate names for the enhanced markdown file:
-
page-explanation.mdβ RECOMMENDED- Professional: "explanation" is architectural terminology
- Expert framing: suggests professional analysis and judgment
- Active vs passive: more active than "understanding"
- Field alignment: aligns with architectural/code explanation practices
- Parallel to existing
page.md(raw) vspage-explanation.md(expert)
-
page-understanding.md- Clear but more passive
- Emphasizes comprehension over expertise
-
page-details.md- Simple and clear
- Matches "Details" tab name
- May be too generic
Decision: Use page-explanation.md for professional framing and field alignment.
Updated Page Directory Structureβ
projects/{projectId}/files/{file_id}/pages/{page_number}/
βββ page.pdf # Raw PDF page (existing)
βββ page.md # LLM-extracted text (existing, looks OCR-like)
βββ page-explanation.md # NEW: Rich, professional explanation
βββ page-summary-1000char.json # Brief summary (existing)
βββ metadata.json # Page metadata (updated with generation status)
βββ [compliance artifacts] # Compliance reports (existing)
Metadata Updates:
{
"pageNumber": 3,
"fileId": "1",
"explanation": {
"status": "completed",
"generatedAt": "2024-10-20T01:30:00Z",
"primaryModel": "gemini-2.5-pro-latest",
"modelsUsed": {
"gemini-2.5-pro-latest": 2,
"gemini-2.0-flash-exp": 1
},
"iterationsCompleted": 1,
"totalTurns": 3,
"filePath": "page-explanation.md",
"costAnalysis": {
"totalTokens": 17800,
"estimatedTotalCostUsd": 0.101,
"tokenBreakdown": {
"nonCachedInput": {"tokenCount": 7800, "costUsd": 0.023},
"cachedContent": {"tokenCount": 10000, "costUsd": 0.003, "discountPercent": 90},
"output": {"tokenCount": 5000, "costUsd": 0.075}
},
"processingMetadata": {
"durationMs": 45000,
"cachingEfficiencyPercent": 56.2
}
}
}
}
Proto Schema Changesβ
Import Existing Cost Analysis Proto:
import "cost_analysis.proto";
New RPC: GeneratePageExplanation
// Request to generate AI-powered professional explanation for a plan page
message GeneratePageExplanationRequest {
string project_id = 1;
string file_id = 2;
int32 page_number = 3;
// Processing options
bool force_regenerate = 4; // Regenerate even if already exists
int32 max_iterations = 5; // Max loop iterations (default: 1 = single GenerateβReflectβRefine pass)
bool verbose_logging = 6; // Log prompts and responses
}
message GeneratePageExplanationResponse {
bool success = 1;
string status_message = 2;
PageExplanationMetadata metadata = 3;
// Performance metrics (uses existing CostAnalysisMetadata)
CostAnalysisMetadata cost_analysis = 4;
int32 processing_time_seconds = 5;
}
message PageExplanationMetadata {
string status = 1; // "pending", "processing", "completed", "failed"
google.protobuf.Timestamp generated_at = 2;
// Model tracking (multi-model support)
string primary_model = 3; // Main model used (e.g., "gemini-2.5-pro-latest")
map<string, int32> models_used = 4; // Model β turn count (e.g., {"gemini-2.5-pro": 2, "gemini-flash": 1})
// Workflow tracking
int32 iterations_completed = 5; // Number of complete loop cycles (e.g., 2 = ran GenerateβReflectβRefine twice)
int32 total_turns = 6; // Total LLM API calls (e.g., 6 = 2 iterations Γ 3 turns each)
// Output
string file_path = 7; // Relative path: "page-explanation.md"
// Cost analysis (reuses existing CostAnalysisMetadata)
// Note: MetaCostAnalysis supports per-model cost breakdown for multi-model workflows
CostAnalysisMetadata cost_analysis = 8;
}
Note: We're reusing the existing CostAnalysisMetadata message from cost_analysis.proto which provides comprehensive token tracking including:
- Total tokens and estimated cost
- Detailed breakdown (non-cached input, cached content, output, thinking, tool use)
- Rate per million tokens
- Discount percentages for cached content
- Processing metadata (duration, caching efficiency)
This is far superior to creating a simple TokenUsage message and ensures consistency with existing task cost tracking (Issue #176).
Update Existing RPC: GetArchitecturalPlanPage
message ArchitecturalPlanPage {
// ... existing fields ...
// NEW: Rich professional explanation
string explanation_markdown = 20; // Content from page-explanation.md
PageExplanationMetadata explanation_metadata = 21;
}
Component Architectureβ
Backend Services:
PageExplanationService: Generate and manage page explanationgeneratePageExplanation(projectId, fileId, pageNumber, options)getPageExplanation(projectId, fileId, pageNumber)regeneratePageExplanation(...)(force regeneration)
Backend Agentic Workflow:
AgenticPageInterpreter: Multi-turn AI workflowexplainPage(pageContext)β Returns final markdown- Internal methods:
generateInitialDraft(pageContext)- Uses primary model (quality-critical)reflectOnDraft(draft)- Uses reflection model (analytical, can be cheaper)refineWithReflection(draft, reflection, pageContext)- Uses primary model (quality-critical)
- Configurable:
- Model selection strategy:
primaryModel: For generation/refinement (e.g., Gemini 2.5 Pro)reflectionModel: For analysis/scoring (e.g., Gemini Flash)orchestrationModel: For workflow decisions (e.g., Gemini Flash)
- Max iterations
- Quality threshold for stopping
- Reflection prompts
- Temperature: 0.0 for all turns (maximum predictability)
- Model selection strategy:
Frontend Components:
- Update
PageViewerComponent:- Add fourth tab: "Details"
- Lazy-load
page-understanding.mdcontent when tab is clicked - Show loading/error states
DetailsTabComponent(new):- Render markdown content
- Handle empty/loading/error states
- Responsive layout
CLI Tools:
generate-page-explanation.sh: Local generation scriptupgrade-project-and-generate.sh: Legacy upgrade + generation
Environment Configurationβ
No new environment variables required (uses existing Vertex AI credentials for Gemini models).
Optional Configuration (if needed):
# In .env.{environment}
PAGE_EXPLANATION_PRIMARY_MODEL="gemini-2.5-pro-latest" # Primary model for generation/refinement
PAGE_EXPLANATION_REFLECTION_MODEL="gemini-2.0-flash-exp" # Efficient model for reflection
PAGE_EXPLANATION_MAX_ITERATIONS="1" # Max workflow iterations (default: 1)
PAGE_EXPLANATION_ENABLE_CACHE="true" # Use prompt caching for PDFs
Cost Analysisβ
Assumptions:
- Average page: ~1-2 pages of architectural drawing
page.md: ~2,000 tokenspage.pdf: ~5,000 tokens (image)- Model Strategy (Phase 1): Single model for all turns
- Primary:
gemini-2.5-pro-latest
- Primary:
- Model Strategy (Phase 2): Multi-model optimization
- Generation/Refinement:
gemini-2.5-pro-latest(quality-critical) - Reflection:
gemini-2.0-flash-exp(cost-effective, 10x cheaper) - Orchestration:
gemini-2.0-flash-exp(fast decisions)
- Generation/Refinement:
- Workflow: 1 iteration (3 turns: GenerateβReflectβRefine)
Token Usage Per Page (1 Iteration = 3 Turns):
Iteration 1:
Turn 1 (Generate):
Input: 5,000 (PDF, cached) + 2,000 (text) + 500 (prompt) = 7,500 tokens
Output: ~2,000 tokens
Turn 2 (Reflect):
Input: 2,000 (draft) + 300 (prompt) = 2,300 tokens
Output: ~500 tokens (reflection notes)
Turn 3 (Refine):
Input: 5,000 (PDF, cached) + 2,000 (draft) + 500 (reflection) + 500 (prompt) = 8,000 tokens
Output: ~2,500 tokens
Total per page (1 iteration):
Input: 17,800 tokens (10,000 cached)
Output: 5,000 tokens
Iterations: 1
Total turns: 3
If 2 iterations needed (quality < threshold):
Total turns: 5 (Turn 1-3 + Turn 4-5 for ReflectβRefine)
Additional cost: ~$0.04
Cost Per Page - Single Model (Phase 1 - Gemini 2.5 Pro):
- Cached input: 10,000 tokens Γ $0.315 / 1M = $0.003
- Regular input: 7,800 tokens Γ $1.25 / 1M = $0.010
- Output: 5,000 tokens Γ $5.00 / 1M = $0.025
- Total: ~$0.04 per page
Cost Per Page - Multi-Model (Phase 2 - Gemini Pro + Flash):
Turn 1 (Generate - Gemini 2.5 Pro):
Input: 7,500 tokens Γ $1.25/1M = $0.009
Cached: 0
Output: 2,000 tokens Γ $5.00/1M = $0.010
Subtotal: $0.019
Turn 2 (Reflect - Gemini Flash):
Input: 2,300 tokens Γ $0.075/1M = $0.0002
Output: 500 tokens Γ $0.30/1M = $0.0002
Subtotal: $0.0004
Turn 3 (Refine - Gemini 2.5 Pro):
Input: 3,000 tokens Γ $1.25/1M = $0.004
Cached: 5,000 tokens Γ $0.315/1M = $0.002
Output: 2,500 tokens Γ $5.00/1M = $0.013
Subtotal: $0.019
Total: ~$0.038 β ~$0.02 per page (50% savings vs single-model!)
Project Cost (100-page project):
- Single-model (Gemini Pro): $0.04 Γ 100 = $4.00
- Multi-model (Pro + Flash): $0.02 Γ 100 = $2.00 (50% savings!)
- Extremely cost-effective compared to other premium models
Model Selection Strategy:
-
Quality-Critical Tasks (Generation, Refinement):
- Use premium models (Gemini 2.5 Pro)
- Creative, nuanced writing required
- Gemini 2.5 Pro: Excellent quality at great cost/performance ratio (~$0.02/turn)
- Cost: Higher, but essential for quality
-
Analytical Tasks (Reflection, Quality Scoring):
- Use efficient models (Gemini Flash)
- Structured analysis, less creativity needed
- Gemini Flash: Extremely cost-effective (~$0.0004/turn)
- Cost: 50-100x cheaper than premium models
-
Orchestration/Decision (Workflow control, simple logic):
- Use fastest models (Gemini Flash)
- Binary decisions, routing logic
- Cost: Negligible impact (~$0.0001/turn)
Optimization Opportunities:
- Prompt caching reduces cost by 75% for repeated PDF tokens (Gemini)
- Batch processing multiple pages with same PDF (file-level caching)
- Dynamic model selection based on page complexity
- Use Gemini Flash for simple pages, Gemini Pro for complex ones
- Gemini Flash for all reflection turns (50-100x cheaper than Pro)
Testing Strategyβ
Unit Tests:
PageUnderstandingServicelogic- Agentic workflow orchestration
- Markdown generation and validation
Integration Tests:
- End-to-end page understanding generation
- Multi-turn workflow correctness
- Prompt caching effectiveness
- Local CLI tool execution
Manual Testing Checklist:
- Generate understanding for sample pages
- Review markdown quality and completeness
- Test Details tab in UI
- Verify prompt caching reduces cost
- Test local development workflow
Evaluation Metrics:
- Readability: Is the markdown easy to understand for beginners?
- Completeness: Does it explain all page elements?
- Accuracy: Are technical terms and concepts correct?
- Educational Value: Can a non-expert learn from it?
- Iteration Value: Does reflection improve quality vs single-pass?
Observability and Debuggingβ
Agent Trajectory Trackingβ
Requirement: Capture complete execution trace of agentic workflows for debugging, optimization, and quality assessment.
What to Track:
-
All LLM Calls (turns):
- Model name, prompt, response
- Token usage, cost
- Latency
- Cache hits
-
Tool Invocations:
- Tool name, inputs, outputs
- Execution time
- Success/failure status
- Nested LLM calls (if tool uses LLM internally)
-
Iterations:
- Iteration number
- Draft version (v1, v2, v3...)
- Quality score per iteration
- Quality improvement trajectory
-
Decisions:
- Continue iterating? Why or why not?
- Quality threshold checks
- Max iterations reached
Export Format: JSON
{
"trajectory_id": "traj_abc123",
"workflow_name": "page_explanation",
"project_id": "R2024.0091",
"file_id": "1",
"page_number": 3,
"started_at": "2024-10-20T02:30:00Z",
"completed_at": "2024-10-20T02:31:45Z",
"total_duration_ms": 105000,
"iterations": [
{
"iteration_number": 1,
"turns": [
{
"turn_number": 1,
"turn_type": "LLM_CALL",
"phase_name": "GENERATE",
"model_name": "gemini-2.5-pro-latest",
"input_tokens": 7500,
"output_tokens": 2000,
"cost_usd": 0.019,
"prompt_template": "generate.txt",
"response_summary": "Generated 1200-word explanation..."
},
{
"turn_number": 2,
"turn_type": "LLM_CALL",
"phase_name": "REFLECT",
"model_name": "gemini-2.0-flash-exp",
"input_tokens": 2300,
"output_tokens": 500,
"cost_usd": 0.0004,
"quality_score": 0.75,
"gaps_identified": ["Missing AHO-1 explanation", "Define FAR"]
},
{
"turn_number": 3,
"turn_type": "LLM_CALL",
"phase_name": "REFINE",
"model_name": "gemini-2.5-pro-latest",
"cached_tokens": 5000,
"cost_usd": 0.019
}
],
"quality_score": 0.85,
"quality_threshold_met": true
}
],
"outcome": {
"success": true,
"status": "completed",
"final_quality_score": 0.85,
"stop_reason": "QUALITY_MET"
},
"total_turns": 3,
"cost_analysis": {
"total_cost_usd": 0.038,
"models_used": {
"gemini-2.5-pro-latest": 2,
"gemini-2.0-flash-exp": 1
}
}
}
Storage:
- BigQuery: Individual
LlmTracerecords (existing) - GCS: Complete
AgentTrajectoryJSON files - Firestore: Trajectory metadata for listing/searching
Access:
- API:
ExportAgentTrajectory(trajectory_id)β JSON - CLI:
export-trajectory --trajectory-id=traj_123 --output=trace.json - UI (Future): Visual timeline of agent thought process
Use Cases:
- Debugging: "Why did the agent make 5 turns instead of 3?"
- Optimization: "Which prompts cause quality issues?"
- Cost Analysis: "Where are we spending tokens?"
- Quality: "How much does reflection improve quality?"
- Education: "Show me how the AI thinks through a complex page"
Success Metricsβ
User Engagementβ
- % of users who view Details tab (target: >50%)
- Time spent on Details tab vs Overview tab
- User feedback on educational value
AI Qualityβ
- Human evaluation scores (readability, completeness, accuracy)
- Iteration improvement: quality gain per iteration
- Average turns needed to reach quality threshold
- Error rate (pages failing to generate)
System Performanceβ
- Average generation time per page (target: less than 2 minutes)
- Token cost per page (target: less than $0.15)
- Cache hit rate for PDFs (target: greater than 80%)
Developer Productivityβ
- Time to iterate on prompts locally (target: less than 5 minutes per test)
- Prompt optimization success rate
Risks and Mitigationsβ
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| High generation cost | High | Medium | Use prompt caching aggressively, batch processing, cheaper models for reflection |
| Slow generation (user wait time) | Medium | High | Asynchronous background processing, show loading states, prioritize recently viewed pages |
| Poor markdown quality | High | Medium | Multi-turn refinement, extensive prompt engineering, human evaluation loop |
| Prompt caching not effective | High | Low | Validate caching behavior, ensure PDF reuse across turns |
| Local CLI performance | Low | Low | Use same cloud APIs, optimize file I/O |
| Content not educational enough | High | Medium | Extensive prompt tuning, include example outputs in prompt, human feedback |
Future Enhancementsβ
Phase 2: Image Integrationβ
- Embed extracted legend images in markdown
- Show symbol definitions with visual examples
- Extract and annotate key details from PDF
Phase 3: Interactive Elementsβ
- Collapsible sections for advanced details
- Inline term glossary with popups
- Cross-references to related pages
Phase 4: Multi-Lingual Supportβ
- Generate understanding in multiple languages
- Auto-detect user language preference
Phase 5: Custom Generationβ
- User-adjustable detail level (beginner, intermediate, expert)
- Focus areas (zoning only, structural only, etc.)
- Export to PDF or Word
Phase 6: Collaborative Annotationsβ
- Users can add comments to Details tab
- Share annotations with team members
Open Questionsβ
-
File Naming:
page-explanation.mdvspage-understanding.mdvspage-details.md?- Answer:
page-explanation.md(professional framing, field alignment)
- Answer:
-
Generation Trigger: Automatic after
page.mdextraction or on-demand?- Answer: Hybrid - automatic for new uploads, on-demand regeneration via API/CLI
-
Iteration Count: Fixed 1 iteration or dynamic based on quality score?
- Answer: Start with fixed 1 iteration (3 turns), add quality-based iteration in Phase 2
-
Model Selection: Single model or multi-model strategy?
- Answer:
- Phase 1: Single model (Gemini 2.5 Pro) for simplicity
- Phase 2: Multi-model optimization:
- Generation/Refinement turns: Gemini 2.5 Pro (quality-critical, best cost/quality)
- Reflection turns: Gemini Flash (cost-effective, 50-100x cheaper)
- Orchestration logic: Gemini Flash (fast decision-making)
- Cost savings: ~50% reduction by using Flash for reflection
- Quality impact: Minimal (reflection is analytical, not creative)
- Answer:
-
Legacy Projects: Auto-generate explanation for all pages or user-opt-in?
- Answer: User-opt-in via CLI tool (avoid surprise costs)
-
Caching Strategy: Per-page or per-file PDF caching?
- Answer: Per-page initially (simpler), explore file-level batching in Phase 2
Related Documentationβ
- File Structure Reorganization PRD: Multi-file structure foundation
- Developer Playbook: Build and deployment workflows
- PRD/TDD Workflow: Feature development process
- Background Tasks Architecture: Async task processing