TDD: Architectural Plan Page Explanation & Educational Details
π Product Requirements: Plan Page Explanation PRD
π Implementation Issue: Issue #258 - AI-Powered Plan Page Explanation with Agentic Workflow
Overviewβ
This Technical Design Document details the implementation of AI-powered page explanation generation that transforms raw LLM-extracted text and PDF images into comprehensive, professional explanation markdown. The system uses an agentic multi-turn workflow (ReAct pattern) to iteratively refine explanations through self-reflection.
Architecture Overviewβ
System Componentsβ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Angular) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PageViewerComponent β β
β β ββ [Overview] [Preview] [Compliance] [Details] β NEW β β
β β ββ DetailsTabComponent β NEW β β
β ββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ β
β β gRPC-Web β
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββ
β βΌ Backend (Java/Spring) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ArchitecturalPlanService (Facade) β β
β β ββ getArchitecturalPlanPage(...) β Enhanced β β
β ββββββββββ¬βββββββββββββββββββββββββββββββββββ¬βββββββββββββ β
β β β β
β ββββββββββΌβββββββββββββββ βββββββββββββΌβββββββββββββββ β
β β PageExplanation β β Agentic Workflow β β
β β Service (NEW) β β Engine (NEW) β β
β β ββ generate() β β ββ AgenticPage β β
β β ββ get() β β β Interpreter β β
β β ββ regenerate() β β ββ IterativeRefiner β β
β ββββββββββ¬βββββββββββββββ ββββββββββββββ¬ββββββββββββββ β
β β β β
β ββββββββββΌβββββββββββββββββββββββββββββββββββΌβββββββββββββββ β
β β LLM Integration Layer β β
β β (Vertex AI - Gemini Models) β β
β β + Prompt Caching Manager β β
β ββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
βββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cloud Storage (GCS) β
β projects/{projectId}/files/{file_id}/pages/{pageNumber}/ β
β βββ page.pdf β INPUT (cached) β
β βββ page.md β INPUT β
β βββ page-explanation.md β OUTPUT (NEW) β
β βββ metadata.json β UPDATED β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flow: Page Understanding Generationβ
User/System Request
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββ
β PageExplanationService β
β .generatePageExplanation(...) β
ββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββ
β Load Page Context β
β ββ Read page.md β
β ββ Read page.pdf β
β ββ Read metadata.json β
ββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββ
β AgenticPageInterpreter β
β .explainPage(pageContext) β
ββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββ ββββββββββββββββββββββ
β TURN 1: Generate β β Prompt Caching β
β Initial Draft ββββββββββββββββ€ Manager β
β β β (Cache PDF) β
β Input: β ββββββββββββββββββββββ
β - page.pdf (img) β
β - page.md (text) β
β - generation promptβ
β β
β Output: β
β - draft-v1.md β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β TURN 2: Reflect β
β on Draft Quality β
β β
β Input: β
β - draft-v1.md β
β - reflection promptβ
β β
β Output: β
β - reflection.json β
β (gaps, issues) β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β TURN 3: Refine β
β with Reflection β
β β
β Input: β
β - page.pdf (cached)β
β - draft-v1.md β
β - reflection.json β
β - refinement promptβ
β β
β Output: β
β - page-understandingβ
β .md (FINAL) β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββ
β Optional: Additionalβ
β Iterations β
β (If max_iter > 3) β
ββββββββββ¬ββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββ
β Save Artifacts β
β ββ Write page-explanation.md β
β ββ Update metadata.json β
β ββ Log generation metrics β
ββββββββββββββββββββββββββββββββββββββββββ
Proto Message Definitionsβ
Import Existing Cost Analysisβ
import "cost_analysis.proto";
import "google/protobuf/timestamp.proto";
New Messages for Page Interpretationβ
syntax = "proto3";
package codetricks.construction.api;
import "google/protobuf/timestamp.proto";
import "cost_analysis.proto";
// ============================================================================
// PAGE INTERPRETATION - Request/Response Messages
// ============================================================================
// Request to generate AI-powered professional explanation for a plan page
message GeneratePageExplanationRequest {
// Project identification
string project_id = 1;
string file_id = 2;
int32 page_number = 3;
// Processing options
bool force_regenerate = 4; // Regenerate even if already exists
int32 max_phases_completed = 5; // Max agentic turns (default: 3)
bool verbose_logging = 6; // Log prompts and responses
// Model configuration (optional, uses defaults if not specified)
// Multi-model strategy enables cost optimization:
// - Use premium models (Gemini Pro) for quality-critical tasks (Generation, Refinement)
// - Use efficient models (Gemini Flash) for analytical tasks (Reflection, Scoring)
// - Gemini Flash is 50-100x cheaper than Pro with minimal quality impact for reflection
string primary_model = 7; // Primary model for generation/refinement (default: "gemini-2.5-pro-latest")
string reflection_model = 8; // Model for reflection/analysis (default: same as primary, or "gemini-2.0-flash-exp" for 50% cost savings)
bool enable_caching = 9; // Use prompt caching (default: true)
// Advanced: Per-turn model override for experimentation
map<string, string> turn_models = 10; // Turn type β model (e.g., {"REFLECT": "gemini-flash", "GENERATE": "gemini-pro"})
}
message GeneratePageExplanationResponse {
bool success = 1;
string status_message = 2;
// Metadata about generated explanation
PageExplanationMetadata metadata = 3;
// Performance metrics (reuses existing CostAnalysisMetadata)
CostAnalysisMetadata cost_analysis = 4;
int32 processing_time_seconds = 5;
// Error details (if success = false)
string error_code = 6;
string error_details = 7;
}
// Metadata about page explanation generation
message PageExplanationMetadata {
// Generation status
string status = 1; // "pending", "processing", "completed", "failed"
google.protobuf.Timestamp generated_at = 2;
// Model tracking (multi-model support for cost optimization)
// Phase 1: Single model (primary_model only)
// Phase 2: Multi-model (different models for different turn types)
string primary_model = 3; // Main model used (e.g., "gemini-2.5-pro-latest")
map<string, int32> models_used = 4; // Model β turn count (e.g., {"gemini-2.5-pro": 2, "gemini-flash": 1})
// Workflow tracking
int32 iterations_completed = 5; // Number of complete loop cycles
int32 total_turns = 6; // Total LLM API calls
// Cost analysis (reuses existing CostAnalysisMetadata)
// Provides comprehensive token tracking:
// - Total tokens and estimated cost
// - Detailed breakdown (non-cached input, cached content, output)
// - Rate per million tokens
// - Discount percentages for cached content
// - Processing metadata (duration, caching efficiency)
CostAnalysisMetadata cost_analysis = 5;
// Output file
string file_path = 6; // Relative path: "page-explanation.md"
// Quality metrics (optional, for evaluation)
float quality_score = 7; // 0.0-1.0 (future: human evaluation)
}
Note: This reuses the existing CostAnalysisMetadata message from cost_analysis.proto (Issue #176) which is already used for task cost tracking. This ensures:
- Consistency: Same cost tracking across all LLM operations
- Richness: Comprehensive token breakdown with caching metrics
- Integration: Works with existing Firestore task tracking
- No Duplication: Avoids creating redundant proto messages
Multi-Model Support: The MetaCostAnalysis message (also in cost_analysis.proto) supports per-model cost breakdown for workflows that use multiple models:
message MetaCostAnalysis {
map<string, CostAnalysisMetadata> model_costs = 1; // Per-model breakdown
double total_cost_usd = 2; // Aggregated total
int32 total_tokens = 3;
string primary_model = 4;
}
This is perfect for multi-model workflows where:
- Turn 1 (Generate): Uses other premium models β tracked separately
- Turn 2 (Reflect): Uses other efficient models β tracked separately
- Turn 3 (Refine): Uses other premium models β tracked separately
- Final: Aggregated cost shows total across all models
// ============================================================================
// EXISTING MESSAGE UPDATES
// ============================================================================
// Update to existing ArchitecturalPlanPage message
message ArchitecturalPlanPage {
// ... existing fields (pageNumber, fileId, title, summary, etc.) ...
// NEW: Rich understanding content
string explanation_markdown = 20; // Content from page-explanation.md
PageExplanationMetadata explanation_metadata = 21;
// Indicates if understanding is available
bool has_understanding = 22;
}
// Update to existing GetArchitecturalPlanPageRequest
message GetArchitecturalPlanPageRequest {
string project_id = 1;
string file_id = 2;
int32 page_number = 3;
// NEW: Control whether to include understanding
bool include_understanding = 4; // Default: true
}
New gRPC Service Methodsβ
service ArchitecturalPlanService {
// ... existing methods ...
// NEW: Generate page explanation
rpc GeneratePageExplanation(GeneratePageExplanationRequest)
returns (GeneratePageExplanationResponse);
// NEW: Get page explanation status
rpc GetPageExplanationStatus(GetPageExplanationStatusRequest)
returns (PageExplanationMetadata);
// NEW: Batch generate for multiple pages
rpc BatchGeneratePageExplanation(BatchGeneratePageExplanationRequest)
returns (stream GeneratePageExplanationResponse);
}
message GetPageExplanationStatusRequest {
string project_id = 1;
string file_id = 2;
int32 page_number = 3;
}
message BatchGeneratePageExplanationRequest {
string project_id = 1;
string file_id = 2;
repeated int32 page_numbers = 3; // Pages to process
GeneratePageExplanationRequest options = 4; // Shared options
}
Multi-Model Strategy (Detailed)β
Model Selection by Task Typeβ
The agentic workflow performs different types of tasks, each with different requirements and optimal model choices:
| Task Type | Turn(s) | Requirements | Best Models | Cost/Turn | Quality Impact |
|---|---|---|---|---|---|
| Generation | 1 | Creative writing, comprehensive coverage, professional tone | Gemini 2.5 Pro, other premium models, GPT-4 | ~$0.02 | Critical - use premium |
| Reflection | 2, 4, 6... | Analytical assessment, gap identification, scoring | Gemini Flash, other efficient models | ~$0.0004 | Minimal - use efficient |
| Refinement | 3, 5, 7... | Creative improvement, tone consistency, gap filling | Gemini 2.5 Pro (same as Turn 1) | ~$0.02 | Critical - use premium |
| Orchestration | N/A | Quality threshold checks, iteration decisions | Gemini Flash | ~$0.0001 | None - use fastest |
Cost Optimization Examplesβ
Example 1: Single-Model (Phase 1 Implementation)β
PageExplanationConfig config = PageExplanationConfig.builder()
.primaryModel("gemini-2.5-pro")
.reflectionModel(null) // Use primary for all turns
.maxIterations(1)
.build();
// Iteration 1:
// - Turn 1 (Generate): Gemini Pro β $0.019
// - Turn 2 (Reflect): Gemini Pro β $0.003
// - Turn 3 (Refine): Gemini Pro β $0.019
// Total: $0.04/page
Example 2: Multi-Model (Phase 2 Optimization)β
PageExplanationConfig config = PageExplanationConfig.builder()
.primaryModel("gemini-2.5-pro") // For Generate/Refine
.reflectionModel("gemini-2.5-flash") // For Reflect (40% cheaper)
.maxIterations(1)
.build();
// Iteration 1:
// - Turn 1 (Generate): Gemini Pro β $0.019
// - Turn 2 (Reflect): Gemini Flash β $0.0004 (50x cheaper!)
// - Turn 3 (Refine): Gemini Pro β $0.019
// Total: $0.02/page (50% savings!)
Example 3: Dynamic Model Selection (Advanced)β
// Use cheap model for simple pages, premium for complex ones
PageExplanationConfig config = PageExplanationConfig.builder()
.primaryModel(pageComplexity > 0.7
? "gemini-2.5-pro" // Complex: Use premium
: "gemini-2.5-flash") // Simple: Use efficient
.reflectionModel("gemini-2.5-flash") // Always cheap for reflection
.maxIterations(pageComplexity > 0.7 ? 2 : 1) // More iterations for complex pages
.build();
Model Capabilities Matrixβ
| Model | Input Cost | Output Cost | Cached Cost | Strengths | Best For |
|---|---|---|---|---|---|
| Gemini 2.5 Pro β | $1.25/1M | $5.00/1M | $0.315/1M | Excellent quality, best cost/perf ratio | Generation, Refinement |
| Gemini Flash β | $0.075/1M | $0.30/1M | $0.01875/1M | Extremely fast and cheap, good analysis | Reflection, Simple pages |
| other premium models | $3.00/1M | $15.00/1M | $0.30/1M | Superior creative writing | Complex creative tasks |
| other efficient models | $0.25/1M | $1.25/1M | $0.03/1M | Fast, cost-effective | Reflection, Scoring |
| GPT-4 Turbo | $10.00/1M | $30.00/1M | N/A | Superior reasoning | Very complex pages only |
β Recommended: Gemini 2.5 Pro + Flash combination offers the best balance of quality and cost.
Turn-Specific Model Recommendationsβ
Turn 1 (Initial Generation):
- Recommended: Gemini 2.5 Pro β
- Why: First impression sets tone and structure, quality-critical
- Advantages: Excellent quality, multimodal, great cost/perf ratio ($1.25/1M input)
- Avoid: Gemini Flash for initial generation - quality matters more than speed here
Turn 2, 4, 6... (Reflection):
- Recommended: Gemini Flash β
- Why: Analytical task, structured output (JSON), minimal creativity needed
- Advantages: Extremely cheap ($0.075/1M vs $1.25/1M for Pro) = 95% savings
- Quality Impact: Minimal - reflection is analysis, not creative writing
Turn 3, 5, 7... (Refinement):
- Recommended: Gemini 2.5 Pro β (same as Turn 1)
- Why: Must maintain consistent tone, style, and quality
- Avoid: Switching between Pro and Flash for generation tasks - causes style inconsistency
ADK (Agent Development Kit) Integrationβ
Phase 1 MVP: Pragmatic 3-Tool Implementationβ
Goal: Ship working feature fast, design for future expansion
Philosophy: Start simple, build sophisticated
- 3 core tools (not 11)
- 1 agent (not 8)
- Sequential execution (parallel later)
- Full observability from day 1
- Extensible architecture (add tools incrementally)
Timeline: 1 week to working feature
Overviewβ
This feature will be implemented using Google's Agent Development Kit (ADK) for Java to leverage proven agent orchestration patterns already established in the codebase.
Maven Dependency:
<dependency>
<groupId>com.google.adk</groupId>
<artifactId>google-adk</artifactId>
<version>0.3.0</version>
</dependency>
Related:
- ADK Java Source Reference:
github/adk-java/(downloaded for source code reference only, not used directly) - Maven Central: google-adk
- Documentation: ADK Java Docs
- Issue #257: Custom OpenAPI Toolset and ADK integration
- Existing Usage:
ArchitecturalPlanReviewAgent.java,MultiToolAgent.java
Note: We use the Maven dependency for the actual implementation. The github/adk-java/ folder is downloaded for source code reference and documentation purposes only.
Why ADK for Page Explanation?β
- Proven Framework: Already used successfully in
ArchitecturalPlanReviewAgentfor building code compliance - Multi-Turn Support: Native support for iterative workflows (GenerateβReflectβRefine)
- Tool Integration:
FunctionToolfor custom methods, easy reflection/refinement orchestration - State Management: Built-in session and memory management for multi-turn conversations
- Gemini Native: Designed for Gemini models with optimal integration
- Event Streaming: RxJava
Flowable<Event>for reactive progress tracking - Callbacks: Before/after hooks for logging, cost tracking, quality scoring
Phase 1 MVP: 3-Tool Architectureβ
PageExplanationAgent (LlmAgent)
ββ Model: gemini-2.5-pro-latest (primary)
ββ Temperature: 0.0 (maximum consistency)
ββ Instruction: "Orchestrate Generate β Assess β Refine workflow"
β
ββ Tools (Phase 1 - Core 3):
β ββ generateExplanation() - Uses Pro, creates initial draft
β ββ assessQuality() - Uses Flash, returns `{score, confidence, gaps}`
β ββ refineExplanation() - Uses Pro, improves draft
β
ββ Tools (Phase 1.5 - Easy Additions):
β ββ extractKeyInsights() - Flash, understand page first
β ββ checkCompleteness() - Flash, validate coverage
β (Just add FunctionTool, no architecture change)
β
ββ Tools (Phase 2+ - Future):
β ββ searchBuildingCodes() - Flash + RAG
β ββ analyzeVisualElements() - Pro, multimodal
β ββ validateTechnicalTerms() - Flash
β ββ findRelatedPages() - Flash
β (Add as needed, architecture supports it)
β
ββ Callbacks (Observability):
β ββ afterModelCallback - TrajectoryTrackingCallback
β ββ afterToolCallback - CostTrackingCallback
β
ββ Session Management: InMemorySessionService
Extensibility Pattern: Tools array is the only change needed to add features!
ADK Implementation Patternβ
Following the established pattern from ArchitecturalPlanReviewAgent:
// Similar to ArchitecturalPlanReviewAgent.java
public class PageExplanationAgent {
// Public static for ADK Dev UI compatibility
public static BaseAgent ROOT_AGENT = initAgent();
public static BaseAgent initAgent() {
return LlmAgent.builder()
.name("page_explanation_agent")
.model("gemini-2.5-pro-latest") // Primary model
.generateContentConfig(
GenerateContentConfig.builder()
.temperature(0.0F) // Maximum predictability and consistency
.build())
.description("Generates professional explanations of architectural plan pages")
.instruction(AGENT_INSTRUCTION)
.tools(
FunctionTool.create(PageExplanationAgent.class, "generateInitialDraft"),
FunctionTool.create(PageExplanationAgent.class, "reflectOnQuality"),
FunctionTool.create(PageExplanationAgent.class, "refineExplanation")
)
.afterModelCallback(new CostTrackingCallback())
.build();
}
}
Multi-Model Support with ADKβ
ADK doesn't natively support per-turn model switching, but we can implement it using custom tools with embedded model calls:
public class PageExplanationTools {
private final VertexAiClient vertexAi;
/**
* Tool for reflection using efficient Gemini Flash model.
* This bypasses the agent's primary model to use a cheaper model.
*/
public ReflectionResult reflectOnQuality(
@Schema(description = "The draft explanation to review") String draftMarkdown) {
// Use Gemini Flash for cost savings (not the agent's primary Gemini Pro model)
GenerativeModel flashModel = new GenerativeModel.Builder()
.setModelName("gemini-2.0-flash-exp")
.setVertexAi(vertexAi)
.build();
String reflectionPrompt = buildReflectionPrompt(draftMarkdown);
GenerateContentResponse response = flashModel.generateContent(reflectionPrompt);
// Parse reflection JSON response
return ReflectionResult.fromJson(response.getText());
}
}
Iterative Workflow with ADKβ
The ADK agent naturally supports our GenerateβReflectβRefine workflow:
Iteration 1:
- Turn 1: Agent calls
generateInitialDraft()tool - Turn 2: Agent calls
reflectOnQuality()tool (uses Flash model internally) - Turn 3: Agent calls
refineExplanation()tool with reflection results
Iteration 2+ (if quality < threshold):
4. Turn 4: Agent calls reflectOnQuality() again
5. Turn 5: Agent calls refineExplanation() again
The agent decides when to stop based on:
- Quality score from reflection
- Max iterations reached
- Instruction-based stopping criteria
ADK Callbacks for Trackingβ
public class CostTrackingCallback implements AfterModelCallbackSync {
private final CostAnalysisBuilder costBuilder = new CostAnalysisBuilder();
@Override
public Maybe<Content> call(CallbackContext callbackContext) {
// Extract token usage from Gemini response
UsageMetadata usage = callbackContext.modelResponse().usageMetadata();
// Track per-model costs
costBuilder.addTurn(
callbackContext.invocationContext().getAgent().model(),
usage.getPromptTokenCount(),
usage.getCandidatesTokenCount(),
usage.getCachedContentTokenCount()
);
// Log turn completion
logger.info("Turn {}: {} tokens ({} cached)",
costBuilder.getTurnCount(),
usage.getTotalTokenCount(),
usage.getCachedContentTokenCount());
return Maybe.empty(); // Don't modify content
}
}
Comparison: ADK vs Custom Workflowβ
| Aspect | Custom Implementation | ADK Implementation |
|---|---|---|
| Agent Loop | Manual orchestration | Built-in AutoFlow |
| Tool Calling | Custom logic | Native FunctionTool |
| Model Switching | Direct API calls | Tools with embedded models |
| State Management | Manual tracking | SessionService + MemoryService |
| Event Streaming | Custom events | RxJava Flowable<Event> |
| Cost Tracking | Custom | Callbacks + UsageMetadata |
| Testing | Custom harness | ADK Dev UI |
| Debugging | Logs only | Dev UI + Event traces |
Maven Configurationβ
Add to pom.xml:
<dependencies>
<!-- ADK Core - For agent orchestration -->
<dependency>
<groupId>com.google.adk</groupId>
<artifactId>google-adk</artifactId>
<version>0.3.0</version>
</dependency>
<!-- ADK Dev UI - For local testing (optional) -->
<dependency>
<groupId>com.google.adk</groupId>
<artifactId>google-adk-dev</artifactId>
<version>0.3.0</version>
<scope>provided</scope>
</dependency>
<!-- Vertex AI SDK - For Gemini models -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-aiplatform</artifactId>
<version>3.x.x</version>
</dependency>
</dependencies>
Build Command:
export JAVA_HOME=/usr/lib/jvm/temurin-23-jdk-arm64
mvn clean install
ADK Best Practices (from existing code)β
Based on ArchitecturalPlanReviewAgent and Issue #257:
-
Use FunctionTool for Custom Logic:
FunctionTool.create(PageExplanationAgent.class, "generateInitialDraft") -
Leverage OpenAPI for External Services (if needed):
OpenApiToolset toolset = OpenApiToolset.builder()
.addOpenApiSpecFromFile("openapi.yaml")
.baseUrl("http://localhost:8082")
.build(); -
Use GenerateContentConfig for Model Settings:
.generateContentConfig(
GenerateContentConfig.builder()
.temperature(0.0F) // Maximum predictability
.build()) -
Expose ROOT_AGENT for Dev UI:
public static BaseAgent ROOT_AGENT = initAgent(); -
Use InMemoryRunner for Execution:
InMemoryRunner runner = new InMemoryRunner(ROOT_AGENT); -
Handle Events with RxJava:
Flowable<Event> events = runner.runAsync(userId, sessionId, userMsg);
events.blockingForEach(event -> processEvent(event));
Implementation Detailsβ
Backend Implementationβ
1. PageExplanationServiceβ
File: src/main/java/org/codetricks/construction/code/assistant/understanding/PageExplanationService.java
package org.codetricks.construction.code.assistant.understanding;
import com.google.protobuf.Timestamp;
import org.codetricks.construction.code.assistant.FileSystemHandler;
import org.codetricks.construction.code.assistant.proto.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;
import java.time.Instant;
import java.util.Optional;
/**
* Service for generating and managing AI-powered page explanation.
*
* This service orchestrates the agentic workflow that transforms raw plan page
* content (OCR text + PDF) into comprehensive, educational markdown.
*
* Related:
* - PRD: docs/04-prd/plan-page-explanation.md
* - TDD: docs/05-tdd/plan-page-explanation.md
*/
@Service
public class PageExplanationService {
private static final Logger logger = LoggerFactory.getLogger(PageExplanationService.class);
private final FileSystemHandler fileSystemHandler;
private final ProjectPathResolver pathResolver;
private final AgenticPageInterpreter agenticInterpreter;
private final PromptCachingManager cachingManager;
public PageExplanationService(
FileSystemHandler fileSystemHandler,
ProjectPathResolver pathResolver,
AgenticPageInterpreter agenticInterpreter,
PromptCachingManager cachingManager) {
this.fileSystemHandler = fileSystemHandler;
this.pathResolver = pathResolver;
this.agenticInterpreter = agenticInterpreter;
this.cachingManager = cachingManager;
}
/**
* Generate page explanation with agentic workflow.
*/
public GeneratePageExplanationResponse generatePageExplanation(
GeneratePageExplanationRequest request) {
String projectId = request.getProjectId();
String fileId = request.getFileId();
int pageNumber = request.getPageNumber();
logger.info("Starting page explanation generation: project={}, file={}, page={}",
projectId, fileId, pageNumber);
try {
// 1. Check if understanding already exists (unless force regenerate)
if (!request.getForceRegenerate() && understandingExists(projectId, fileId, pageNumber)) {
logger.info("Page understanding already exists, skipping generation");
return GeneratePageExplanationResponse.newBuilder()
.setSuccess(true)
.setStatusMessage("Page understanding already exists")
.setMetadata(loadExistingMetadata(projectId, fileId, pageNumber))
.build();
}
// 2. Load page context (page.md, page.pdf, metadata.json)
PageContext pageContext = loadPageContext(projectId, fileId, pageNumber);
// 3. Update metadata to "processing" status
updateMetadataStatus(projectId, fileId, pageNumber, "processing");
// 4. Run agentic workflow
long startTime = System.currentTimeMillis();
AgenticInterpretationResult result = agenticInterpreter.explainPage(
pageContext,
request.getMaxIterations() > 0 ? request.getMaxIterations() : 3,
request.getModelName().isEmpty() ? null : request.getModelName(),
request.getEnableCaching()
);
long processingTimeMs = System.currentTimeMillis() - startTime;
// 5. Save page-explanation.md
String pageFolderPath = pathResolver.resolvePageFolderPath(projectId, pageNumber, fileId);
String explanationPath = pageFolderPath + "/page-explanation.md";
fileSystemHandler.writeFile(explanationPath, result.getFinalMarkdown());
// 6. Update metadata with results
PageExplanationMetadata metadata = buildMetadata(
result,
request.getPrimaryModel().isEmpty() ? "gemini-2.5-pro-latest" : request.getPrimaryModel(),
(int) (processingTimeMs / 1000)
);
saveMetadata(projectId, fileId, pageNumber, metadata);
logger.info("Page understanding generated successfully: tokens={}, time={}s",
result.getTotalTokensUsed(), processingTimeMs / 1000);
return GeneratePageExplanationResponse.newBuilder()
.setSuccess(true)
.setStatusMessage("Page understanding generated successfully")
.setMetadata(metadata)
.setTotalTokensUsed(result.getTotalTokensUsed())
.setProcessingTimeSeconds((int) (processingTimeMs / 1000))
.build();
} catch (Exception e) {
logger.error("Failed to generate page explanation", e);
// Update metadata to "failed" status
updateMetadataStatus(projectId, fileId, pageNumber, "failed");
return GeneratePageExplanationResponse.newBuilder()
.setSuccess(false)
.setStatusMessage("Failed to generate page explanation")
.setErrorCode("GENERATION_FAILED")
.setErrorDetails(e.getMessage())
.build();
}
}
/**
* Get page explanation content (for Details tab).
*/
public Optional<String> getPageExplanation(String projectId, String fileId, int pageNumber)
throws PageNotFoundException {
String pageFolderPath = pathResolver.resolvePageFolderPath(projectId, pageNumber, fileId);
String explanationPath = pageFolderPath + "/page-explanation.md";
if (fileSystemHandler.exists(explanationPath)) {
return Optional.of(fileSystemHandler.readFile(explanationPath));
}
return Optional.empty();
}
/**
* Check if page explanation exists.
*/
private boolean explanationExists(String projectId, String fileId, int pageNumber)
throws PageNotFoundException {
String pageFolderPath = pathResolver.resolvePageFolderPath(projectId, pageNumber, fileId);
String explanationPath = pageFolderPath + "/page-explanation.md";
return fileSystemHandler.exists(explanationPath);
}
/**
* Load page context (inputs for agentic workflow).
*
* Uses ProjectPathResolver for consistent path resolution.
*/
private PageContext loadPageContext(String projectId, String fileId, int pageNumber)
throws PageNotFoundException {
// Use ProjectPathResolver for consistent path handling
String pageFolderPath = pathResolver.resolvePageFolderPath(projectId, pageNumber, fileId);
// Load page.md using ProjectPathResolver
String pageMarkdownPath = pathResolver.getPageMarkdownPath(projectId, pageNumber, fileId);
String pageMarkdown = fileSystemHandler.readFile(pageMarkdownPath);
// Load page.pdf using ProjectPathResolver
String pagePdfPath = pathResolver.getPagePdfPath(projectId, pageNumber, fileId);
byte[] pagePdfBytes = fileSystemHandler.readFileBytes(pagePdfPath);
// Load metadata.json using ProjectPathResolver
String metadataPath = pathResolver.getPageMetadataPath(projectId, pageNumber, fileId);
String metadataJson = fileSystemHandler.exists(metadataPath)
? fileSystemHandler.readFile(metadataPath)
: "{}";
return PageContext.builder()
.projectId(projectId)
.fileId(fileId)
.pageNumber(pageNumber)
.pageMarkdown(pageMarkdown)
.pagePdfBytes(pagePdfBytes)
.metadataJson(metadataJson)
.build();
}
/**
* Build metadata from agentic result.
*/
private PageExplanationMetadata buildMetadata(
AgenticInterpretationResult result,
String modelName,
int processingTimeSeconds) {
return PageExplanationMetadata.newBuilder()
.setStatus("completed")
.setGeneratedAt(Timestamp.newBuilder()
.setSeconds(Instant.now().getEpochSecond())
.build())
.setModel(modelName)
.setIterations(result.getIterationCount())
.setTokensUsed(TokenUsage.newBuilder()
.setInput(result.getTotalInputTokens())
.setOutput(result.getTotalOutputTokens())
.setCached(result.getTotalCachedTokens())
.build())
.setFilePath("page-explanation.md")
.build();
}
/**
* Save metadata to metadata.json.
*/
private void saveMetadata(String projectId, String fileId, int pageNumber,
PageExplanationMetadata metadata) throws PageNotFoundException {
String metadataPath = pathResolver.getPageMetadataPath(projectId, pageNumber, fileId);
// Read existing metadata, update understanding section, write back
// (Implementation uses JSON merging logic - omitted for brevity)
logger.info("Saved page explanation metadata: {}", metadataPath);
}
/**
* Update metadata status only.
*/
private void updateMetadataStatus(String projectId, String fileId, int pageNumber, String status) {
// Similar to saveMetadata but only updates status field
logger.info("Updated page explanation status to: {}", status);
}
/**
* Load existing metadata (if already generated).
*/
private PageExplanationMetadata loadExistingMetadata(String projectId, String fileId, int pageNumber) {
// Load from metadata.json and parse understanding section
// (Implementation omitted for brevity)
return PageExplanationMetadata.newBuilder()
.setStatus("completed")
.build();
}
}
2. AgenticPageInterpreter (Agentic Workflow Engine)β
File: src/main/java/org/codetricks/construction/code/assistant/understanding/AgenticPageInterpreter.java
package org.codetricks.construction.code.assistant.understanding;
import org.codetricks.construction.code.assistant.llm.LLMClient;
import org.codetricks.construction.code.assistant.llm.PromptCachingManager;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
/**
* Agentic workflow engine for iterative page explanation generation.
*
* Implements a multi-turn ReAct-inspired loop:
* 1. Generate initial understanding draft
* 2. Reflect on draft quality (identify gaps)
* 3. Refine draft with reflection insights
*
* Uses prompt caching for PDF images across turns to reduce costs.
*/
@Component
public class AgenticPageInterpreter {
private static final Logger logger = LoggerFactory.getLogger(AgenticPageInterpreter.class);
private final LLMClient llmClient;
private final PromptCachingManager cachingManager;
private final PromptTemplateLoader promptLoader;
public AgenticPageInterpreter(
LLMClient llmClient,
PromptCachingManager cachingManager,
PromptTemplateLoader promptLoader) {
this.llmClient = llmClient;
this.cachingManager = cachingManager;
this.promptLoader = promptLoader;
}
/**
* Interpret architectural plan page with multi-turn agentic workflow.
*
* @param pageContext Input context (page.md, page.pdf, metadata)
* @param maxIterations Maximum number of turns (default: 3)
* @param maxIterations Maximum number of complete workflow cycles (default: 1)
* @param enableCaching Use prompt caching for PDF (default: true)
* @return Final understanding markdown and metrics
*/
public AgenticInterpretationResult explainPage(
PageContext pageContext,
int maxIterations,
String modelName,
boolean enableCaching) {
logger.info("Starting agentic explanation: project={}, file={}, page={}, maxIter={}",
pageContext.getProjectId(), pageContext.getFileId(),
pageContext.getPageNumber(), maxIterations);
// Initialize result tracking
AgenticInterpretationResult.Builder resultBuilder = AgenticInterpretationResult.builder();
List<CostAnalysisMetadata> turnCostsList = new ArrayList<>();
String currentDraft = null;
String reflectionNotes = null;
try {
// TURN 1: Generate initial understanding
logger.info("TURN 1: Generating initial explanation draft");
TurnResult turn1 = generateInitialDraft(pageContext, primaryModel, enableCaching);
currentDraft = turn1.getOutput();
turnCostsList.add(turn1.getCostAnalysis());
logger.info("TURN 1 completed: {} tokens ({} cached), cost: ${}",
turn1.getCostAnalysis().getTotalTokens(),
turn1.getCostAnalysis().getTokenBreakdown().getCachedContent().getTokenCount(),
turn1.getCostAnalysis().getEstimatedTotalCostUsd());
// If max phases_completed is 1, skip reflection and refinement
if (maxIterations <= 1) {
logger.info("Max phases_completed = 1, returning initial draft");
return buildFinalResult(currentDraft, turnMetricsList);
}
// TURN 2: Reflect on draft quality
logger.info("TURN 2: Reflecting on draft quality");
TurnResult turn2 = reflectOnDraft(currentDraft, reflectionModel != null ? reflectionModel : primaryModel);
reflectionNotes = turn2.getOutput();
turnCostsList.add(turn2.getCostAnalysis());
logger.info("TURN 2 completed: identified improvement areas, cost: ${}",
turn2.getCostAnalysis().getEstimatedTotalCostUsd());
// If max phases_completed is 2, return draft with reflection logged
if (maxIterations <= 2) {
logger.info("Max phases_completed = 2, returning draft after reflection");
return buildFinalResult(currentDraft, turnMetricsList);
}
// TURN 3: Refine with reflection insights
logger.info("TURN 3: Refining draft with reflection insights");
TurnResult turn3 = refineWithReflection(
pageContext, currentDraft, reflectionNotes, primaryModel, enableCaching);
currentDraft = turn3.getOutput();
turnCostsList.add(turn3.getCostAnalysis());
logger.info("TURN 3 completed: final explanation generated, cost: ${}",
turn3.getCostAnalysis().getEstimatedTotalCostUsd());
// Additional phases_completed (if maxIterations > 3)
for (int i = 4; i <= maxIterations; i++) {
logger.info("TURN {}: Additional refinement iteration", i);
// Reflect again
TurnResult reflectAgain = reflectOnDraft(currentDraft, modelName);
reflectionNotes = reflectAgain.getOutput();
turnMetricsList.add(reflectAgain.getMetrics());
// Refine again
TurnResult refineAgain = refineWithReflection(
pageContext, currentDraft, reflectionNotes, modelName, enableCaching);
currentDraft = refineAgain.getOutput();
turnMetricsList.add(refineAgain.getMetrics());
logger.info("TURN {} completed", i);
}
return buildFinalResult(currentDraft, turnMetricsList);
} catch (Exception e) {
logger.error("Agentic explanation failed", e);
throw new RuntimeException("Failed to interpret page", e);
}
}
/**
* TURN 1: Generate initial understanding draft.
*/
private TurnResult generateInitialDraft(
PageContext pageContext,
String modelName,
boolean enableCaching) {
// Load prompt template
String promptTemplate = promptLoader.loadTemplate("page-understanding-generate.txt");
// Build prompt with page context
String prompt = promptTemplate
.replace("{{PAGE_MARKDOWN}}", pageContext.getPageMarkdown())
.replace("{{PROJECT_ID}}", pageContext.getProjectId())
.replace("{{FILE_ID}}", pageContext.getFileId())
.replace("{{PAGE_NUMBER}}", String.valueOf(pageContext.getPageNumber()));
// Prepare LLM request with PDF image
// Uses PRIMARY model for quality-critical generation
LLMRequest request = LLMRequest.builder()
.modelName(primaryModel)
.systemPrompt(promptLoader.loadTemplate("page-understanding-system.txt"))
.userPrompt(prompt)
.imageBytes(pageContext.getPagePdfBytes())
.imageMediaType("application/pdf")
.enableCaching(enableCaching)
.cacheImageMarker(true) // Mark PDF for caching
.maxTokens(4000)
.temperature(0.0) // Maximum predictability and consistency
.build();
// Call LLM
LLMResponse response = llmClient.generate(request);
// Return result with CostAnalysisMetadata
return TurnResult.builder()
.turnNumber(1)
.output(response.getContent())
.costAnalysis(response.getCostAnalysisMetadata()) // From LLM response
.build();
}
/**
* TURN 2: Reflect on draft quality.
*
* MODEL SELECTION: Uses REFLECTION model (cost-optimizable)
* - This turn performs analytical quality assessment
* - Requires: Structured analysis, gap identification, scoring
* - Best models: other efficient models, Gemini Flash (efficient tier)
* - Cost: ~$0.002/turn (10-20x cheaper than premium)
*
* OPTIMIZATION OPPORTUNITY:
* Using Gemini Flash instead of Pro for reflection saves ~50% on total cost
* with minimal quality impact (reflection is analytical, not creative).
*/
private TurnResult reflectOnDraft(String draft, String reflectionModel) {
// Load reflection prompt
String promptTemplate = promptLoader.loadTemplate("page-understanding-reflect.txt");
String prompt = promptTemplate.replace("{{DRAFT_MARKDOWN}}", draft);
// Prepare LLM request (no image needed for reflection)
// Uses REFLECTION model (can be cheaper than primary for cost savings)
LLMRequest request = LLMRequest.builder()
.modelName(reflectionModel)
.systemPrompt(promptLoader.loadTemplate("page-understanding-reflect-system.txt"))
.userPrompt(prompt)
.maxTokens(2000)
.temperature(0.0) // Consistent temperature for predictability
.build();
// Call LLM
LLMResponse response = llmClient.generate(request);
// Return result with CostAnalysisMetadata
return TurnResult.builder()
.turnNumber(2)
.output(response.getContent())
.costAnalysis(response.getCostAnalysisMetadata()) // From LLM response
.build();
}
/**
* TURN 3+: Refine draft with reflection insights.
*
* MODEL SELECTION: Uses PRIMARY model (quality-critical)
* - This turn improves explanation based on reflection feedback
* - Requires: Creative refinement, maintaining tone, addressing gaps
* - Best models: other premium models, GPT-4 (premium tier)
* - Cost: ~$0.05/turn (expensive but essential for quality)
*
* Note: Must use same model as Turn 1 (Generate) to maintain consistent
* writing style, tone, and quality throughout the explanation.
*/
private TurnResult refineWithReflection(
PageContext pageContext,
String draft,
String reflectionNotes,
String primaryModel,
boolean enableCaching) {
// Load refinement prompt
String promptTemplate = promptLoader.loadTemplate("page-understanding-refine.txt");
String prompt = promptTemplate
.replace("{{DRAFT_MARKDOWN}}", draft)
.replace("{{REFLECTION_NOTES}}", reflectionNotes)
.replace("{{PAGE_MARKDOWN}}", pageContext.getPageMarkdown());
// Prepare LLM request with PDF image (reuse cache)
// Uses PRIMARY model for quality-critical refinement
LLMRequest request = LLMRequest.builder()
.modelName(primaryModel)
.systemPrompt(promptLoader.loadTemplate("page-understanding-system.txt"))
.userPrompt(prompt)
.imageBytes(pageContext.getPagePdfBytes())
.imageMediaType("application/pdf")
.enableCaching(enableCaching)
.cacheImageMarker(true) // Reuse cached PDF
.maxTokens(5000)
.temperature(0.0) // Maximum predictability
.build();
// Call LLM
LLMResponse response = llmClient.generate(request);
// Return result with CostAnalysisMetadata
return TurnResult.builder()
.turnNumber(3)
.output(response.getContent())
.costAnalysis(response.getCostAnalysisMetadata()) // From LLM response
.build();
}
/**
* Build final result from turn cost analyses.
*
* Aggregates CostAnalysisMetadata from all turns using MetaCostAnalysis
* for proper per-model cost tracking.
*/
private AgenticInterpretationResult buildFinalResult(
String finalMarkdown,
List<CostAnalysisMetadata> turnCostsList) {
// Use MetaCostAnalysis to aggregate per-model costs
MetaCostAnalysis.Builder metaBuilder = MetaCostAnalysis.newBuilder();
double totalCost = 0.0;
int totalTokens = 0;
// Aggregate by model
Map<String, CostAnalysisMetadata.Builder> modelCosts = new HashMap<>();
for (CostAnalysisMetadata turnCost : turnCostsList) {
String model = turnCost.getModel();
totalCost += turnCost.getEstimatedTotalCostUsd();
totalTokens += turnCost.getTotalTokens();
// Merge into per-model aggregation
// (Implementation omitted for brevity - would merge token breakdowns)
}
MetaCostAnalysis metaCost = metaBuilder
.setTotalCostUsd(totalCost)
.setTotalTokens(totalTokens)
.setPrimaryModel(turnCostsList.get(0).getModel())
.build();
return AgenticInterpretationResult.builder()
.finalMarkdown(finalMarkdown)
.iterationCount(turnCostsList.size())
.metaCostAnalysis(metaCost)
.turnCosts(turnCostsList)
.build();
}
}
3. Prompt Templatesβ
File: src/main/resources/prompts/page-understanding-generate.txt
You are an expert architectural plan interpreter. Your task is to generate a comprehensive,
educational explanation of an architectural plan page that makes it accessible to beginners
and non-industry experts.
# Input Context
**Project ID:** {{PROJECT_ID}}
**File ID:** {{FILE_ID}}
**Page Number:** {{PAGE_NUMBER}}
## Raw OCR Text
{{PAGE_MARKDOWN}}
## PDF Image
[Attached: Full-resolution PDF page image]
# Your Task
Generate a rich, educational markdown document that explains this plan page comprehensively.
Your explanation should:
1. **Be Beginner-Friendly**: Use simple language, define technical terms inline, and explain
architectural concepts as if teaching someone new to the field.
2. **Be Comprehensive**: Cover all major elements visible on the page - drawings, tables,
legends, annotations, title blocks, etc.
3. **Explain Relationships**: Show how elements connect (e.g., how zoning requirements affect
building setbacks, how legends map to drawing symbols).
4. **Provide Context**: Explain what each section means in the broader context of construction
and building codes.
5. **Use Rich Markdown**: Structure with headings, lists, tables, blockquotes for definitions,
and emphasis where helpful.
6. **Include Visual Descriptions**: Describe what the drawings show, not just the text.
# Output Format
Generate markdown with the following structure:
```markdown
# [Page Title/Name]
## Overview
Brief introduction to what this page contains and its purpose.
## Key Information
### [Section 1]
Detailed explanation of the first major section...
**Technical Term**: Definition inline for beginners.
### [Section 2]
...
## Understanding the Drawings
Describe visual elements, symbols, and what they represent...
## Architectural Concepts Explained
Explain any complex concepts for beginners...
## Code Compliance Considerations
If relevant, explain how this relates to building codes...
## Summary
Recap the most important takeaways from this page.
Guidelines
- Assume the reader has NO architectural background
- Define ALL technical terms when first used
- Use analogies when explaining complex concepts
- Be thorough but not overwhelming
- Focus on understanding, not just description
- Make it educational and engaging
Generate the educational markdown now:
**File**: `src/main/resources/prompts/page-understanding-reflect.txt`
```text
You are a quality reviewer for educational architectural content. Your task is to review
the following page explanation draft and identify areas for improvement.
# Draft to Review
{{DRAFT_MARKDOWN}}
# Your Task
Analyze this draft and identify:
1. **Gaps in Coverage**: What important elements from the page are missing or under-explained?
2. **Clarity Issues**: Where is the language unclear, too technical, or confusing for beginners?
3. **Missing Context**: Where could relationships between elements be explained better?
4. **Definition Gaps**: Are there technical terms that need inline definitions?
5. **Structure Issues**: Could the organization be improved for better readability?
6. **Educational Value**: Where could the content be more engaging or educational?
# Output Format
Provide your reflection as structured JSON:
```json
{
"gaps": [
"Missing explanation of X",
"Section Y needs more detail on Z"
],
"clarity_issues": [
"Term 'ABC' is not defined",
"Paragraph about DEF is too technical"
],
"missing_context": [
"Relationship between X and Y not explained",
"How Z affects building design unclear"
],
"structure_suggestions": [
"Consider adding a subsection for X",
"Reorder sections Y and Z for better flow"
],
"overall_assessment": "Brief summary of draft quality and main improvement areas"
}
Generate your reflection now:
**File**: `src/main/resources/prompts/page-understanding-refine.txt`
```text
You are an expert architectural plan interpreter refining an educational explanation based
on quality feedback.
# Original Draft
{{DRAFT_MARKDOWN}}
# Reflection and Improvement Areas
{{REFLECTION_NOTES}}
# Original Raw Content (for reference)
{{PAGE_MARKDOWN}}
## PDF Image (for reference)
[Attached: Full-resolution PDF page image]
# Your Task
Improve the draft by addressing the identified issues:
1. Fill gaps in coverage
2. Clarify unclear sections
3. Add missing context and relationships
4. Define missing technical terms inline
5. Improve structure if needed
6. Enhance educational value
# Guidelines
- Keep what works well in the original draft
- Focus improvements on the identified issues
- Maintain beginner-friendly language
- Ensure comprehensive coverage
- Make it engaging and educational
# Output Format
Generate the improved markdown (full document, not just changes):
```markdown
[Your improved, comprehensive page explanation here]
Generate the refined understanding now:
### Frontend Implementation
#### 1. Update PageViewerComponent
**File**: `web-ng-m3/src/app/components/page-viewer/page-viewer.component.ts`
```typescript
import { Component, OnInit, Input } from '@angular/core';
import { ArchitecturalPlanService } from '../../shared/architectural-plan.service';
import { ArchitecturalPlanPage, PageExplanationMetadata } from '../../shared/proto/api';
@Component({
selector: 'app-page-viewer',
templateUrl: './page-viewer.component.html',
styleUrls: ['./page-viewer.component.scss']
})
export class PageViewerComponent implements OnInit {
@Input() projectId!: string;
@Input() fileId!: string;
@Input() pageNumber!: number;
// Tab state
selectedTabIndex = 0; // 0: Overview, 1: Preview, 2: Compliance, 3: Details (NEW)
// Page data
page?: ArchitecturalPlanPage;
// Details tab state (NEW)
understandingMarkdown?: string;
understandingMetadata?: PageExplanationMetadata;
understandingLoading = false;
understandingError?: string;
constructor(private planService: ArchitecturalPlanService) {}
ngOnInit(): void {
this.loadPage();
}
/**
* Load page data (including understanding if available).
*/
loadPage(): void {
this.planService.getArchitecturalPlanPage(
this.projectId,
this.fileId,
this.pageNumber,
true // include_understanding = true
).subscribe({
next: (page) => {
this.page = page;
// Check if understanding is available
if (page.hasUnderstanding && page.understandingMarkdown) {
this.understandingMarkdown = page.understandingMarkdown;
this.understandingMetadata = page.understandingMetadata;
}
},
error: (err) => {
console.error('Failed to load page', err);
}
});
}
/**
* Handle Details tab selection (lazy load if needed).
*/
onTabChange(tabIndex: number): void {
this.selectedTabIndex = tabIndex;
// If Details tab (index 3) and understanding not loaded yet
if (tabIndex === 3 && !this.understandingMarkdown && !this.understandingError) {
this.loadUnderstanding();
}
}
/**
* Load page explanation (triggered by Details tab selection).
*/
loadUnderstanding(): void {
// If metadata says it's processing, show loading state
if (this.understandingMetadata?.status === 'processing') {
this.understandingLoading = true;
// Poll for completion (or use WebSocket for real-time updates)
this.pollUnderstandingStatus();
return;
}
// If metadata says it's failed, show error
if (this.understandingMetadata?.status === 'failed') {
this.understandingError = 'Failed to generate page explanation';
return;
}
// Otherwise, trigger generation if not exists
if (!this.page?.hasUnderstanding) {
this.generateUnderstanding();
}
}
/**
* Trigger page explanation generation.
*/
generateUnderstanding(): void {
this.understandingLoading = true;
this.understandingError = undefined;
this.planService.generatePageExplanation(
this.projectId,
this.fileId,
this.pageNumber
).subscribe({
next: (response) => {
if (response.success) {
// Reload page to get understanding content
this.loadPage();
} else {
this.understandingError = response.statusMessage;
}
this.understandingLoading = false;
},
error: (err) => {
console.error('Failed to generate understanding', err);
this.understandingError = 'Failed to generate page explanation';
this.understandingLoading = false;
}
});
}
/**
* Poll for understanding generation completion.
*/
pollUnderstandingStatus(): void {
const pollInterval = setInterval(() => {
this.planService.getPageExplanationStatus(
this.projectId,
this.fileId,
this.pageNumber
).subscribe({
next: (metadata) => {
if (metadata.status === 'completed') {
clearInterval(pollInterval);
this.loadPage(); // Reload to get content
this.understandingLoading = false;
} else if (metadata.status === 'failed') {
clearInterval(pollInterval);
this.understandingError = 'Failed to generate page explanation';
this.understandingLoading = false;
}
},
error: (err) => {
console.error('Failed to poll status', err);
clearInterval(pollInterval);
this.understandingError = 'Failed to check generation status';
this.understandingLoading = false;
}
});
}, 5000); // Poll every 5 seconds
}
}
File: web-ng-m3/src/app/components/page-viewer/page-viewer.component.html
<mat-card class="page-viewer-card">
<!-- Tab Group with NEW Details tab -->
<mat-tab-group [(selectedIndex)]="selectedTabIndex" (selectedTabChange)="onTabChange($event.index)">
<!-- Overview Tab (existing) -->
<mat-tab label="Overview">
<div class="tab-content">
<app-page-overview [page]="page"></app-page-overview>
</div>
</mat-tab>
<!-- Preview Tab (existing) -->
<mat-tab label="Preview">
<div class="tab-content">
<app-page-preview [page]="page"></app-page-preview>
</div>
</mat-tab>
<!-- Compliance Tab (existing) -->
<mat-tab label="Compliance">
<div class="tab-content">
<app-page-compliance [page]="page"></app-page-compliance>
</div>
</mat-tab>
<!-- Details Tab (NEW) -->
<mat-tab label="Details">
<div class="tab-content details-tab">
<!-- Loading State -->
<div *ngIf="understandingLoading" class="loading-state">
<mat-spinner diameter="40"></mat-spinner>
<p>Generating detailed page explanation...</p>
<p class="loading-hint">This may take 1-2 minutes. AI is analyzing the page content.</p>
</div>
<!-- Error State -->
<div *ngIf="understandingError && !understandingLoading" class="error-state">
<mat-icon color="warn">error</mat-icon>
<p>{{ understandingError }}</p>
<button mat-raised-button color="primary" (click)="generateUnderstanding()">
<mat-icon>refresh</mat-icon> Retry
</button>
</div>
<!-- Content State -->
<div *ngIf="understandingMarkdown && !understandingLoading" class="understanding-content">
<!-- Metadata Banner -->
<div class="metadata-banner">
<mat-icon>auto_awesome</mat-icon>
<span>AI-generated explanation</span>
<span class="metadata-details">
Generated {{ understandingMetadata?.generatedAt | date:'short' }} |
{{ understandingMetadata?.phases_completedCompleted }} phases_completed
</span>
</div>
<!-- Markdown Content -->
<markdown [data]="understandingMarkdown" class="markdown-content"></markdown>
</div>
<!-- Empty State (no understanding available, not generating) -->
<div *ngIf="!understandingMarkdown && !understandingLoading && !understandingError" class="empty-state">
<mat-icon>description</mat-icon>
<h3>Details Not Yet Generated</h3>
<p>AI-powered page explanation has not been generated for this page yet.</p>
<button mat-raised-button color="primary" (click)="generateUnderstanding()">
<mat-icon>auto_awesome</mat-icon> Generate Details
</button>
</div>
</div>
</mat-tab>
</mat-tab-group>
</mat-card>
File: web-ng-m3/src/app/components/page-viewer/page-viewer.component.scss
.page-viewer-card {
margin: 16px;
}
.tab-content {
padding: 24px;
min-height: 400px;
}
.details-tab {
.loading-state,
.error-state,
.empty-state {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
min-height: 400px;
text-align: center;
mat-icon {
font-size: 48px;
width: 48px;
height: 48px;
margin-bottom: 16px;
}
p {
margin: 8px 0;
color: #666;
}
.loading-hint {
font-size: 0.875rem;
font-style: italic;
}
button {
margin-top: 16px;
}
}
.understanding-content {
.metadata-banner {
display: flex;
align-items: center;
gap: 8px;
padding: 12px 16px;
background-color: #e3f2fd;
border-left: 4px solid #2196f3;
margin-bottom: 24px;
border-radius: 4px;
mat-icon {
color: #2196f3;
}
.metadata-details {
margin-left: auto;
font-size: 0.875rem;
color: #666;
}
}
.markdown-content {
// Markdown styling
font-size: 1rem;
line-height: 1.6;
h1, h2, h3, h4, h5, h6 {
margin-top: 1.5em;
margin-bottom: 0.5em;
font-weight: 600;
}
h1 { font-size: 2rem; border-bottom: 2px solid #e0e0e0; padding-bottom: 0.3em; }
h2 { font-size: 1.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.3em; }
h3 { font-size: 1.25rem; }
h4 { font-size: 1.1rem; }
p {
margin-bottom: 1em;
}
ul, ol {
margin-bottom: 1em;
padding-left: 2em;
}
li {
margin-bottom: 0.5em;
}
table {
width: 100%;
border-collapse: collapse;
margin-bottom: 1em;
th, td {
border: 1px solid #e0e0e0;
padding: 8px 12px;
text-align: left;
}
th {
background-color: #f5f5f5;
font-weight: 600;
}
}
blockquote {
border-left: 4px solid #2196f3;
padding-left: 16px;
margin-left: 0;
color: #666;
font-style: italic;
}
code {
background-color: #f5f5f5;
padding: 2px 6px;
border-radius: 3px;
font-family: 'Courier New', monospace;
font-size: 0.9em;
}
pre {
background-color: #f5f5f5;
padding: 12px;
border-radius: 4px;
overflow-x: auto;
code {
background-color: transparent;
padding: 0;
}
}
strong, b {
font-weight: 600;
color: #000;
}
}
}
}
CLI Toolsβ
Local Generation Scriptβ
File: scripts/generate-page-explanation.sh
#!/bin/bash
################################################################################
# Generate Page Understanding (Local Development)
#
# Generates AI-powered page explanation for architectural plan pages using
# local project folders. Supports rapid iteration without cloud deployments.
#
# Usage:
# ./scripts/generate-page-explanation.sh --project-path=PATH [OPTIONS]
#
# Examples:
# # Generate for all pages in a project
# ./scripts/generate-page-explanation.sh \
# --project-path=projects/R2024.0091-2024-10-14
#
# # Generate for specific file and pages
# ./scripts/generate-page-explanation.sh \
# --project-path=projects/R2024.0091-2024-10-14 \
# --file-id=1 \
# --page-numbers=1,2,3
#
# # Force regeneration with verbose logging
# ./scripts/generate-page-explanation.sh \
# --project-path=projects/R2024.0091-2024-10-14 \
# --force \
# --verbose
#
# Prerequisites:
# - Java 17+ (Temurin 23 in dev container)
# - Maven 3.8+
# - Vertex AI credentials configured
# - Project structure: files/{file_id}/pages/{page_number}/
#
# What it does:
# 1. Validates project path and structure
# 2. Discovers pages to process
# 3. Calls PageExplanationService for each page
# 4. Generates page-explanation.md files
# 5. Updates metadata.json with generation status
# 6. Outputs summary (pages processed, tokens used, time)
################################################################################
set -e # Exit on any error
# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Helper functions
log_info() { echo -e "${BLUE}βΉοΈ $1${NC}"; }
log_success() { echo -e "${GREEN}β
$1${NC}"; }
log_warning() { echo -e "${YELLOW}β οΈ $1${NC}"; }
log_error() { echo -e "${RED}β $1${NC}"; }
log_section() {
echo ""
echo -e "${BLUE}================================================${NC}"
echo -e "${BLUE}$1${NC}"
echo -e "${BLUE}================================================${NC}"
echo ""
}
# Default values
PROJECT_PATH=""
FILE_ID=""
PAGE_NUMBERS=""
FORCE=false
VERBOSE=false
MAX_ITERATIONS=3
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--project-path=*)
PROJECT_PATH="${1#*=}"
shift
;;
--file-id=*)
FILE_ID="${1#*=}"
shift
;;
--page-numbers=*)
PAGE_NUMBERS="${1#*=}"
shift
;;
--force)
FORCE=true
shift
;;
--verbose)
VERBOSE=true
shift
;;
--max-phases_completed=*)
MAX_ITERATIONS="${1#*=}"
shift
;;
*)
log_error "Unknown argument: $1"
exit 1
;;
esac
done
# Validate required arguments
if [ -z "$PROJECT_PATH" ]; then
log_error "Missing required argument: --project-path"
echo "Usage: $0 --project-path=PATH [OPTIONS]"
exit 1
fi
# Validate project path exists
if [ ! -d "$PROJECT_PATH" ]; then
log_error "Project path does not exist: $PROJECT_PATH"
exit 1
fi
log_section "Page Understanding Generation"
log_info "Project: $PROJECT_PATH"
log_info "File ID: ${FILE_ID:-all files}"
log_info "Page Numbers: ${PAGE_NUMBERS:-all pages}"
log_info "Force Regenerate: $FORCE"
log_info "Verbose Logging: $VERBOSE"
log_info "Max Iterations: $MAX_ITERATIONS"
# Build Java command
log_section "Building Maven Project"
export JAVA_HOME=/usr/lib/jvm/temurin-23-jdk-arm64
mvn clean install -DskipTests
# Run generation (using Spring Boot CLI runner or direct service call)
log_section "Generating Page Understanding"
java -cp "target/classes:target/dependency/*" \
org.codetricks.construction.code.assistant.cli.GeneratePageExplanationCLI \
--project-path="$PROJECT_PATH" \
--file-id="$FILE_ID" \
--page-numbers="$PAGE_NUMBERS" \
--force="$FORCE" \
--verbose="$VERBOSE" \
--max-phases_completed="$MAX_ITERATIONS"
log_success "Page understanding generation complete!"
Project Upgrade and Generation Scriptβ
File: scripts/upgrade-project-and-generate.sh
#!/bin/bash
################################################################################
# Upgrade Project to Multi-File Structure and Generate Understanding
#
# Combines project upgrade with page explanation generation for testing.
#
# Usage:
# ./scripts/upgrade-project-and-generate.sh \
# --source-project=SOURCE \
# --target-project=TARGET
#
# Example:
# ./scripts/upgrade-project-and-generate.sh \
# --source-project=projects/R2024.0091-2024-10-14 \
# --target-project=projects/R2024.0091-test-copy
################################################################################
set -e
# ... (Similar structure to above, omitted for brevity) ...
# 1. Copy project
log_section "Copying Project"
cp -r "$SOURCE_PROJECT" "$TARGET_PROJECT"
# 2. Upgrade to multi-file structure
log_section "Upgrading to Multi-File Structure"
./scripts/migrate-to-multi-file.sh --project-path="$TARGET_PROJECT"
# 3. Generate page explanation
log_section "Generating Page Understanding"
./scripts/generate-page-explanation.sh --project-path="$TARGET_PROJECT"
log_success "Project upgraded and understanding generated!"
Deployment Guideβ
Step 1: Build and Test Locallyβ
# 1. Set Java environment
export JAVA_HOME=/usr/lib/jvm/temurin-23-jdk-arm64
# 2. Build project
mvn clean install
# 3. Run unit tests
mvn test -Dtest=PageExplanationServiceTest
# 4. Test local generation
./scripts/generate-page-explanation.sh \
--project-path=projects/R2024.0091-2024-10-14 \
--file-id=1 \
--page-numbers=3 \
--verbose
Step 2: Deploy Backend to Cloud Runβ
# 1. Build Docker image
gcloud builds submit --tag gcr.io/PROJECT_ID/architectural-plan-service
# 2. Deploy to Cloud Run
gcloud run deploy architectural-plan-service \
--image gcr.io/PROJECT_ID/architectural-plan-service \
--platform managed \
--region us-central1 \
--allow-unauthenticated
# 3. Verify deployment
curl https://YOUR_CLOUD_RUN_URL/health
Step 3: Deploy Frontend to Cloud Storageβ
# 1. Build Angular app
cd web-ng-m3
npm run build
# 2. Deploy to Cloud Storage
gsutil -m rsync -r -d dist/web-ng-m3 gs://YOUR_BUCKET/
# 3. Invalidate CDN cache (if using Cloud CDN)
gcloud compute url-maps invalidate-cdn-cache URL_MAP_NAME --path "/*"
Step 4: Test End-to-Endβ
- Open application in browser
- Navigate to a plan page
- Click "Details" tab
- Verify understanding generation or display
- Check browser console for errors
- Verify markdown rendering
Performance Optimizationsβ
1. Prompt Cachingβ
// Cache PDF image across all turns to reduce costs by 90%
LLMRequest request = LLMRequest.builder()
.imageBytes(pagePdfBytes)
.enableCaching(true)
.cacheImageMarker(true) // Mark for caching
.build();
// Subsequent turns reuse cached image
// Cost: $0.30/1M tokens (cached) vs $3.00/1M (non-cached)
2. Batch Processingβ
// Process multiple pages with same PDF (file-level batching)
public void batchGenerateForFile(String projectId, String fileId, List<Integer> pageNumbers) {
// Load PDF once
byte[] filePdfBytes = loadFilePdf(projectId, fileId);
// Cache PDF at file level
String cacheKey = cachingManager.cachePdf(filePdfBytes);
// Process each page with cached PDF
for (int pageNumber : pageNumbers) {
processPage(projectId, fileId, pageNumber, cacheKey);
}
}
3. Asynchronous Processingβ
// Use Cloud Run Jobs for background processing
@Async
public CompletableFuture<GeneratePageExplanationResponse> generateAsync(
GeneratePageExplanationRequest request) {
GeneratePageExplanationResponse response = generatePageExplanation(request);
return CompletableFuture.completedFuture(response);
}
Security Implementationβ
1. Access Controlβ
// Verify user has access to project before generating
@PreAuthorize("hasProjectAccess(#request.projectId)")
public GeneratePageExplanationResponse generatePageExplanation(
GeneratePageExplanationRequest request) {
// ...
}
2. Rate Limitingβ
// Limit generation requests per user to prevent abuse
@RateLimited(maxRequests = 10, windowSeconds = 3600)
public GeneratePageExplanationResponse generatePageExplanation(
GeneratePageExplanationRequest request) {
// ...
}
Observability and Trajectory Trackingβ
Agent Trajectory Captureβ
Implementation: Capture complete agent execution trace using AgentTrajectory proto.
File: src/main/proto/features/agent_trajectory.proto
Integration with Existing Infrastructure:
- Reuses existing
LlmTracefor individual LLM calls - Extends with iteration and tool tracking
- Stores in BigQuery + GCS for dual access patterns
Trajectory Builderβ
public class AgentTrajectoryBuilder {
private final String trajectoryId;
private final List<AgentIteration> iterations = new ArrayList<>();
private final Instant startedAt;
public void startIteration(int iterationNumber) {
currentIteration = AgentIteration.newBuilder()
.setIterationNumber(iterationNumber)
.setStartedAt(Timestamps.fromMillis(System.currentTimeMillis()))
.build();
}
public void recordLlmTurn(int turnNumber, String phaseName, LlmTrace llmTrace) {
AgentTurn turn = AgentTurn.newBuilder()
.setTurnNumber(turnNumber)
.setIterationNumber(currentIteration.getIterationNumber())
.setTurnType(TurnType.LLM_CALL)
.setLlmTurn(LlmTurn.newBuilder()
.setModelName(llmTrace.getModelName())
.setPhaseName(phaseName)
.setLlmTrace(llmTrace)
.setInputTokens(llmTrace.getUsageMetadata().getPromptTokenCount())
.setOutputTokens(llmTrace.getUsageMetadata().getCandidatesTokenCount())
.setCachedTokens(llmTrace.getUsageMetadata().getCachedContentTokenCount())
.build())
.build();
currentIterationTurns.add(turn);
}
public AgentTrajectory build() {
return AgentTrajectory.newBuilder()
.setTrajectoryId(trajectoryId)
.addAllIterations(iterations)
.setTotalTurns(getTotalTurnCount())
.setCostAnalysis(aggregateCosts())
.build();
}
}
ADK Callbacks for Trajectory Captureβ
public class TrajectoryTrackingCallback implements AfterModelCallbackSync {
private final AgentTrajectoryBuilder trajectoryBuilder;
private int turnCounter = 0;
@Override
public Maybe<Content> call(CallbackContext ctx) {
turnCounter++;
// Extract LLM trace from context
LlmTrace llmTrace = buildLlmTraceFromContext(ctx);
// Determine phase from agent state
String phase = determinePhaseName(ctx); // "GENERATE", "REFLECT", "REFINE"
// Record in trajectory
trajectoryBuilder.recordLlmTurn(turnCounter, phase, llmTrace);
// Also log to BigQuery (existing infrastructure)
llmLogTracer.logTrace(llmTrace);
return Maybe.empty();
}
private String determinePhaseName(CallbackContext ctx) {
// Analyze tool calls or prompt content to determine phase
// Look for keywords: "generate", "reflect", "refine"
String lastToolCalled = ctx.invocationContext().getLastToolName();
if (lastToolCalled.contains("generate")) return "GENERATE";
if (lastToolCalled.contains("reflect")) return "REFLECT";
if (lastToolCalled.contains("refine")) return "REFINE";
return "UNKNOWN";
}
}
Trajectory Storageβ
GCS Path: projects/{projectId}/traces/page-explanation/{trajectory_id}.json
JSON Export:
public String exportTrajectoryAsJson(AgentTrajectory trajectory, boolean prettyPrint) {
JsonFormat.Printer printer = JsonFormat.printer();
if (prettyPrint) {
printer = printer.includingDefaultValueFields();
}
return printer.print(trajectory);
}
Firestore Index (for searching):
Collection: agent_trajectories
Document ID: {trajectory_id}
Fields:
- workflow_name: "page_explanation"
- project_id: "R2024.0091"
- file_id: "1"
- page_number: 3
- started_at: timestamp
- total_duration_ms: 105000
- total_turns: 3
- iterations_completed: 1
- final_quality_score: 0.85
- total_cost_usd: 0.038
- gcs_path: "projects/.../traces/..."
CLI Tool: Export Trajectoryβ
#!/bin/bash
# cli/codeproof.sh export-trajectory
TRAJECTORY_ID=$1
OUTPUT_FILE=${2:-trajectory.json}
# Call gRPC API
grpcurl -d '{
"trajectory_id": "'$TRAJECTORY_ID'",
"include_full_llm_traces": true,
"pretty_print": true
}' \
localhost:8080 \
PageExplanationService/ExportAgentTrajectory \
| jq '.trajectory_json' -r > $OUTPUT_FILE
echo "Trajectory exported to: $OUTPUT_FILE"
Future: Trajectory Visualization UIβ
Component: TrajectoryViewer (Angular)
Features:
- Timeline view of all turns
- Expandable sections for each iteration
- Diff view between draft versions
- Cost breakdown visualization
- Quality score progression chart
- Search/filter by phase, model, cost
Mockup:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Trajectory: page_explanation β
β Project: R2024.0091 | Page: 3 | File: 1 β
β Duration: 1m 45s | Cost: $0.038 | Quality: 0.85β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π Quality Score: [ββββββββββ] 85% β
β π° Cost Breakdown: β
β ββ Gemini Pro (2 turns): $0.038 β
β ββ Gemini Flash (1 turn): $0.0004 β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Iteration 1 βΌ β
β ββ Turn 1 (GENERATE) - Gemini Pro β
β β Input: 7500 tokens | Output: 2000 β
β β Cost: $0.019 | Time: 35s β
β β π View Prompt | View Response β
β ββ Turn 2 (REFLECT) - Gemini Flash β
β β Quality: 0.75 | Gaps: 3 identified β
β β Cost: $0.0004 | Time: 5s β
β β π View Reflection JSON β
β ββ Turn 3 (REFINE) - Gemini Pro β
β Cached: 5000 tokens | Cost: $0.019 β
β Quality: 0.85 β Threshold met β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Troubleshootingβ
Issue: High Token Costsβ
Symptoms: Token usage exceeds budget, costs higher than expected
Solutions:
- Verify prompt caching is enabled and working
- Check cache hit rate in logs
- Use cheaper models for reflection turns (other efficient models)
- Reduce max_phases_completed to 2 instead of 3
Issue: Slow Generationβ
Symptoms: Takes >5 minutes per page
Solutions:
- Check LLM API latency
- Reduce image resolution for PDF
- Use async processing (don't block user)
- Optimize prompts to reduce output tokens
Issue: Poor Quality Outputβ
Symptoms: Markdown is not educational or contains errors
Solutions:
- Review and improve prompt templates
- Increase max_phases_completed to 4 or 5
- Add example outputs to prompts
- Use higher temperature for creativity
- Collect human feedback for prompt tuning