File Structure Reorganization

📋 Product Requirements: File Structure Reorganization PRD
📋 Implementation Issue: Issue #167

Overview

This Technical Design Document details the implementation of hierarchical file structure with rich metadata, replacing the flat pages/ directory with files/{file_id}/pages/ while maintaining full backward compatibility with legacy projects.

Architecture Overview

System Components

┌─────────────────────────────────────────────────────────────┐
│                    Frontend (Angular)                       │
│  ┌────────────────────┐  ┌──────────────────────────────┐   │
│  │ FileMetadataList   │  │ LegacyUpgradeBanner          │   │
│  │ Component          │  │ Component                    │   │
│  └────────┬───────────┘  └──────────┬───────────────────┘   │
│           │                         │                       │
│           └──────────┬──────────────┘                       │
│                      │ gRPC-Web                             │
└──────────────────────┼──────────────────────────────────────┘
                       │
┌──────────────────────┼───────────────────────────────────────┐
│                      ▼                                       │
│              gRPC Gateway (Envoy)                            │
└──────────────────────┬───────────────────────────────────────┘
                       │
┌──────────────────────┼───────────────────────────────────────┐
│                      ▼      Backend (Java/Spring)            │
│  ┌─────────────────────────────────────────────────────┐     │
│  │   ArchitecturalPlanService (Facade)                 │     │
│  └───────┬────────────────────────────────┬────────────┘     │
│          │                                │                  │
│  ┌───────▼──────────────┐      ┌──────-───▼─────────────┐    │
│  │ InputFileMetadata    │      │ FileStructureMigration │    │
│  │ Service              │      │ Service                │    │
│  └───────┬──────────────┘      └────-─────┬─────────────┘    │
│          │                                │                  │
│  ┌───────▼────────────────────────────────▼──────────────┐   │
│  │       ProjectPathResolver                            │   │
│  │  (Transparent Legacy Fallback + Path Caching)         │   │
│  └───────┬───────────────────────────────────────────────┘   │
│          │                                                   │
└──────────┼───────────────────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────────────────┐
│               Cloud Storage (GCS)                            │
│  projects/{projectId}/                                       │
│  ├── files/{file_id}/                    ← NEW               │
│  │   ├── metadata.json                   ← NEW               │
│  │   └── pages/{pageNumber}/             ← NEW               │
│  ├── pages/{pageNumber}/                 ← LEGACY            │
│  └── inputs/{filename}                   ← UNCHANGED         │
└──────────────────────────────────────────────────────────────┘

Data Flow: Read Operations (Simplified Strategy)

User Request (projectId, pageNum, optional fileId)
    │
    ▼
┌──────────────────────────────────────────────┐
│ ProjectPathResolver                          │
│ .resolvePagePath(projectId, pageNum, fileId?)│
└────────┬─────────────────────────────────────┘
         │
         ▼
    Is fileId provided?
         │
    ┌────┴────┐
    │ Yes     │ No (Legacy Fallback Only)
    ▼         ▼
┌─────────────────────────┐   Is path cached?
│ Modern: Direct Path     │        │
│ (String Construction)   │   ┌────┴────┐
│ files/{fileId}/pages/   │   │ Yes     │ No
│                         │   ▼         ▼
│ Performance: 0ms        │ Return   ┌──────────────────────────────┐
│ No I/O, No Cache        │ cached   │ Legacy Structure:            │
└─────────┬───────────────┘ path     │ buildLegacyPageFolderPath()  │
          │                          │ + exists() check             │
          │                          └────────┬─────────────────────┘
          │                                   │
          │                              File exists?
          │                                   │
          │                              ┌────┴────┐
          │                              │ Yes     │ No
          │                              ▼         ▼
          │                          Return    Throw
          │                          legacy    PageNotFound
          │                          path      (Modern projects
          │                          (cache)    require file_id)
          └──────────────────────────────┘
                         │
                         ▼
                   Return path

Key Design Decisions:

✅ Modern projects MUST provide file_id (page numbers are file-scoped)
✅ No expensive scanning (removed listSubdirectories() call)
✅ Simple caching (1 GCS exists call vs instant HashMap lookup)
✅ Clear contract: "Want modern structure? Provide file_id!"

Data Flow: Write Operations (Selective Write)

New Page Ingestion
    │
    ▼
┌────────────────────────────────┐
│ IngestArchitecturalPlan        │
│ (projectId, fileId, pageData)  │
└────────┬───────────────────────┘
         │
         ▼
┌─────────────────────────────────┐
│ Detect Project Structure Version│
└────────┬────────────────────────┘
         │
    ┌────┴─────┐
    │ Version? │
    └────┬─────┘
         │
    ┌────┴───────┬───────────┬──────────┐
    │ LEGACY     │ TRANS-    │ MODERN   │
    │            │ ITIONAL   │          │
    ▼            ▼           ▼          
┌─────────┐ ┌────────────┐ ┌───────────────────┐
│ Write to│ │ Write to   │ │ Write to          │
│ pages/  │ │ files/     │ │ files/ ONLY       │
│ (compat)│ │ (new path) │ └───────────────────┘
└─────────┘ └────────────┘
    │            │
    └────────┬───┘
             │
             ▼
    ┌────────────────────┐
    │ Update             │
    │ plan-metadata.json │
    │ (backward compat)  │
    └────────────────────┘

Proto Definitions

Note: Proto definitions already exist in api.proto (lines 225-274). No changes needed!

Enum Naming Consideration: The current enums use prefixed naming (e.g., DOCUMENT_TYPE_ARCHITECTURAL_PLAN, PROCESSING_STATUS_COMPLETED) which doesn't follow our best practice of using dedicated packages with clean enum values (e.g., ARCHITECTURAL_PLAN in package ...file.metadata). However, since these enums are already deployed and used in production:

✅ For this issue: Keep existing enum structure (no breaking changes)
📝 Future enhancement: Consider moving to file_metadata.proto with clean enums (separate refactoring issue)

This aligns with our pragmatic approach: work with what exists, improve incrementally.

Existing Proto Messages (For Reference)

// Already exists in api.proto (line 225)
import "google/protobuf/timestamp.proto";

message InputFileMetadata {
  // Basic file information
  string file_id = 1;                        // Auto-increment ID (e.g., "1", "2", "3")
  string file_name = 2;
  string file_path = 3;
  string mime_type = 4;
  int64 file_size_bytes = 5;
  google.protobuf.Timestamp upload_date = 6;    // When file was uploaded
  
  // Document classification
  DocumentType document_type = 7;
  int32 page_count = 8;
  
  // Processing metadata
  ProcessingStatus processing_status = 9;
  google.protobuf.Timestamp processed_date = 10; // When processing completed
  repeated string extracted_pages = 11;
  
  // Content insights
  string content_summary = 12;
  
  // Technical metadata
  string checksum_md5 = 13;
}

enum DocumentType {
  DOCUMENT_TYPE_UNKNOWN = 0;
  DOCUMENT_TYPE_ARCHITECTURAL_PLAN = 1;
  DOCUMENT_TYPE_MECHANICAL_PLAN = 2;
  DOCUMENT_TYPE_ELECTRICAL_PLAN = 3;
  DOCUMENT_TYPE_STRUCTURAL_PLAN = 4;
  DOCUMENT_TYPE_INSPECTOR_FEEDBACK = 5;
  DOCUMENT_TYPE_PERMIT_APPLICATION = 6;
  DOCUMENT_TYPE_CODE_COMPLIANCE_REPORT = 7;
  DOCUMENT_TYPE_SITE_PLAN = 8;
  DOCUMENT_TYPE_ELEVATION_DRAWING = 9;
  DOCUMENT_TYPE_SECTION_DRAWING = 10;
}

enum ProcessingStatus {
  PROCESSING_STATUS_UNKNOWN = 0;
  PROCESSING_STATUS_UPLOADED = 1;
  PROCESSING_STATUS_PROCESSING = 2;
  PROCESSING_STATUS_COMPLETED = 3;
  PROCESSING_STATUS_FAILED = 4;
}

New Proto Messages for Migration (Add to api.proto)

// Request to migrate a legacy project to new file structure
message MigrateProjectFileStructureRequest {
  // The unique identifier of the project to migrate
  string project_id = 1;
  
  // Whether to preserve the legacy pages/ folder after migration
  // (default: true for safety)
  bool preserve_legacy_structure = 2;
  
  // Whether to run in dry-run mode (preview changes without applying)
  bool dry_run = 3;
  
  // User ID initiating the migration (for audit trail)
  string initiated_by = 4;
}

// Response from file structure migration
message MigrateProjectFileStructureResponse {
  // The unique identifier of the project
  string project_id = 1;
  
  // Whether the migration was successful
  bool success = 2;
  
  // List of files created with metadata
  repeated InputFileMetadata migrated_files = 3;
  
  // Number of pages migrated per file
  map<string, int32> pages_per_file = 4;
  
  // Total number of pages migrated
  int32 total_pages_migrated = 5;
  
  // Error message if migration failed
  string error_message = 6;
  
  // Warnings or informational messages
  repeated string warnings = 7;
  
  // Timestamp when migration completed
  google.protobuf.Timestamp completed_at = 8;
}

// Request to analyze a project's migration readiness
// Performs a comprehensive check to determine if a project can be safely migrated
// from legacy (flat pages/) structure to modern (hierarchical files/) structure.
message AnalyzeProjectMigrationRequest {
  // The unique identifier of the project to analyze
  string project_id = 1;
}

// Response containing detailed migration readiness analysis
// Provides all information needed to decide if/when to migrate a project.
message AnalyzeProjectMigrationResponse {
  // The unique identifier of the project
  string project_id = 1;
  
  // Current project structure version (LEGACY, TRANSITIONAL, or MODERN)
  // - LEGACY: Only has pages/ folder → needs migration
  // - TRANSITIONAL: Has both pages/ and files/ → migration in progress or partially complete
  // - MODERN: Only has files/ folder → already migrated
  ProjectStructureVersion current_version = 2;
  
  // Whether project needs migration (true for LEGACY projects only)
  // If false, project is already migrated or in transition
  bool needs_migration = 3;
  
  // Number of input files found in inputs/ folder
  // Used to estimate how many file metadata entries will be created
  // Good readiness: > 0 (at least one source file exists)
  int32 estimated_file_count = 4;
  
  // Number of existing pages in pages/ folder
  // Used to estimate migration workload
  // Good readiness: matches actual page count in plan-metadata.json
  int32 estimated_page_count = 5;
  
  // Estimated time to complete migration in seconds
  // Calculation: (page_count * 1s) + (file_count * 5s)
  // Good readiness: < 300s (5 minutes) for typical projects
  int32 estimated_duration_seconds = 6;
  
  // Potential issues or blockers that could prevent successful migration
  // Examples:
  // - "No input files found in inputs/ folder"
  // - "Page numbering gaps detected (missing pages 3, 5)"
  // - "Insufficient storage space for migration"
  // - "Project has no pages to migrate"
  // Good readiness: Empty array (no issues)
  repeated string issues = 7;
  
  // Human-readable migration readiness assessment
  // Examples: "READY", "READY_WITH_WARNINGS", "NOT_READY", "ALREADY_MIGRATED"
  // Good readiness: "READY" or "READY_WITH_WARNINGS"
  string readiness_status = 8;
  
  // Detailed explanation of readiness status
  // Provides context and recommendations
  // Example: "Project is ready to migrate. Found 3 input files and 45 pages. 
  //           Estimated time: 2 minutes. No blockers detected."
  string readiness_message = 9;
}

// Enum for project structure version
enum ProjectStructureVersion {
  PROJECT_STRUCTURE_VERSION_UNKNOWN = 0;
  PROJECT_STRUCTURE_VERSION_LEGACY = 1;       // Only pages/
  PROJECT_STRUCTURE_VERSION_TRANSITIONAL = 2; // Both pages/ and files/
  PROJECT_STRUCTURE_VERSION_MODERN = 3;       // Only files/
}

Add New RPCs to ArchitecturalPlanService

service ArchitecturalPlanService {
  // ... existing RPCs ...
  
  // Migrates a legacy project to new file structure
  // Requires OWNER permissions
  rpc MigrateProjectFileStructure(MigrateProjectFileStructureRequest) 
      returns (MigrateProjectFileStructureResponse) {
    option (google.api.http) = {
      post: "/v1/architectural-plans/{project_id}/migrate-file-structure"
      body: "*"
    };
  }
  
  // Analyzes a project's migration readiness
  rpc AnalyzeProjectMigration(AnalyzeProjectMigrationRequest)
      returns (AnalyzeProjectMigrationResponse) {
    option (google.api.http) = {
      get: "/v1/architectural-plans/{project_id}/migration-analysis"
    };
  }
}

Update Existing RPC Request Messages (Backward Compatible)

Critical Update: Existing page-related RPCs must be extended to support the new file structure while maintaining backward compatibility with legacy projects.

Strategy: Optional `file_id` Field

Add an optional file_id field to all page-related request messages. This allows:

✅ Modern projects: Pass file_id for direct page access in files/{file_id}/pages/
✅ Legacy projects: Omit file_id, system uses ProjectPathResolver for fallback to pages/
✅ Zero breaking changes: Existing clients continue to work without modifications

Request Messages Requiring Updates

1. Code Applicability Analysis (api.proto)

message GetApplicableCodeSectionsRequest {
  // The unique identifier of the architectural plan to analyze.
  string architectural_plan_id = 1;
  // The page number of the architectural plan to analyze.
  int32 page_number = 2;
  string icc_book_id = 3; // Example: 2217 for ICC IBC 2021
  
  // NEW FIELD: Optional file ID for direct file access in modern structure
  // If provided, page is accessed via files/{file_id}/pages/{page_number}/
  // If omitted, system uses ProjectPathResolver to check files/ first, then pages/ (legacy)
  // Example: "1", "2", "3" (auto-incrementing IDs)
  string file_id = 4 [deprecated = false];  // Optional, for modern structure support
}

2. Compliance Report Generation (plan.reviewer.proto)

message GetPageSectionComplianceReportRequest {
  string architectural_plan_id = 1;
  int32 page_number = 2;
  string icc_book_id = 3;
  string icc_section_id = 4;
  
  // NEW FIELD: Optional file ID for hierarchical file structure
  string file_id = 5 [deprecated = false];  // Optional
}

message GetPageComplianceReportRequest {
  string architectural_plan_id = 1;
  int32 page_number = 2;
  string icc_book_id = 3;
  
  // NEW FIELD: Optional file ID for hierarchical file structure
  string file_id = 4 [deprecated = false];  // Optional
}

3. Async Compliance Report Task (compliance_report.proto)

message StartPageSectionComplianceReportTaskRequest {
  string architectural_plan_id = 1;
  int32 page_number = 2;
  string icc_book_id = 3;
  string icc_section_id = 4;
  
  // NEW FIELD: Optional file ID for hierarchical file structure
  string file_id = 5 [deprecated = false];  // Optional
}

4. Analysis Availability Check (analysis_availability.proto)

message GetAvailableAnalysisRequest {
  string project_id = 1;
  int32 page_number = 2;
  
  // NEW FIELD: Optional file ID for hierarchical file structure
  string file_id = 3 [deprecated = false];  // Optional
}

5. File Ingestion Response (api.proto)

Note: IngestFileIntoProjectRequest already has filename and doesn't need file_id as input. However, the response should return the assigned file_id:

message IngestFileIntoProjectResponse {
  string project_id = 1;
  string filename = 2;
  int32 pages_processed = 3;
  bool success = 4;
  
  // NEW FIELD: Assigned file ID for the ingested file
  // This allows UI to immediately navigate to files/{file_id}/ structure
  // Example: "1", "2", "3"
  string file_id = 5;  // REQUIRED in modern projects
}

Similarly for StartAsyncIngestFileResponse (task.proto):

message StartAsyncIngestFileResponse {
  string task_id = 1;
  string project_id = 2;
  string filename = 3;
  int32 page_number = 4;
  bool success = 5;
  string message = 6;
  string completed_at = 7;
  
  // NEW FIELD: Assigned file ID for the ingested file
  string file_id = 8;  // REQUIRED when ingestion completes
}

Backend Service Implementation Pattern

When processing requests with the new optional file_id:

public PageApplicabilityAnalysisList getApplicableCodeSections(
    GetApplicableCodeSectionsRequest request) {
  
  String projectId = request.getArchitecturalPlanId();
  int pageNumber = request.getPageNumber();
  String fileId = request.getFileId();  // May be empty/null
  
  // Resolve page path using ProjectPathResolver (handles file_id automatically)
  // - If fileId provided → Direct path (fast, no filesystem checks)
  // - If fileId null/empty → Dual-read logic (cache → modern → legacy)
  String pagePath = pathResolver.resolvePageFolderPath(projectId, pageNumber, fileId);
  
  // Continue with existing logic using resolved path
  // ...
}

Key Benefits:

✅ Single source of truth: All path resolution logic centralized in ProjectPathResolver
✅ Automatic optimization: Fast path when file_id provided, dual-read when not
✅ Consistent behavior: Same logic across all services
✅ Easy testing: Mock ProjectPathResolver for unit tests

CLI Updates Required

The CLI commands (grpcurl, custom scripts) must also be updated to support the new optional parameter:

Example: Legacy CLI call (still works)

grpcurl -d '{
  "architectural_plan_id": "project-123",
  "page_number": 5,
  "icc_book_id": "2217"
}' \
localhost:8080 ArchitecturalPlanReviewService/GetApplicableCodeSections

Example: Modern CLI call (with file_id)

grpcurl -d '{
  "architectural_plan_id": "project-123",
  "page_number": 5,
  "icc_book_id": "2217",
  "file_id": "2"
}' \
localhost:8080 ArchitecturalPlanReviewService/GetApplicableCodeSections

Frontend/UI Updates Required

1. Update API Service Clients (e.g., web-ng-m3/src/app/shared/api.service.ts):

getApplicableCodeSections(
  projectId: string, 
  pageNumber: number, 
  iccBookId: string,
  fileId?: string  // NEW optional parameter
): Observable<PageApplicabilityAnalysisList> {
  const request: GetApplicableCodeSectionsRequest = {
    architectural_plan_id: projectId,
    page_number: pageNumber,
    icc_book_id: iccBookId
  };
  
  // Include file_id only if available (modern projects)
  if (fileId) {
    request.file_id = fileId;
  }
  
  return this.grpcClient.getApplicableCodeSections(request);
}

2. Pass file_id from Components:

When displaying page-specific analysis, components need to know which file the page belongs to. This information comes from:

Modern projects: InputFileMetadata.file_id and InputFileMetadata.extracted_pages
Legacy projects: file_id is undefined/null, system falls back automatically

// In compliance.component.ts or similar
loadPageAnalysis(pageNumber: number) {
  const fileId = this.getFileIdForPage(pageNumber);  // NEW method
  
  this.apiService.getApplicableCodeSections(
    this.projectId,
    pageNumber,
    this.iccBookId,
    fileId  // Pass file_id if available
  ).subscribe(/* ... */);
}

private getFileIdForPage(pageNumber: number): string | undefined {
  // Look up file_id from InputFileMetadata list
  const fileMetadata = this.inputFiles.find(
    f => f.extracted_pages.includes(String(pageNumber))
  );
  return fileMetadata?.file_id;
}

Testing Backward Compatibility

Test Cases:

Legacy Project (no file_id):
- ✅ Request without file_id → System uses ProjectPathResolver → Falls back to pages/ → Success
Modern Project (with file_id):
- ✅ Request with file_id → Direct access to files/{file_id}/pages/ → Success
Modern Project (omit file_id):
- ✅ Request without file_id → ProjectPathResolver checks files/ first → Success
Invalid file_id:
- ❌ Request with invalid file_id → 404 Page Not Found (expected behavior)

Migration Impact

Phase 1: Deploy Proto Changes

Add optional file_id fields to all request messages
Deploy backend changes (proto regeneration)
No frontend changes yet → Existing clients continue working (backward compatible)

Phase 2: Update Backend Services

Modify service implementations to honor file_id when provided
Maintain fallback behavior via ProjectPathResolver
No frontend changes yet → Still backward compatible

Phase 3: Update Frontend (Optional)

Add file_id tracking in UI state
Pass file_id in modern projects for performance optimization
Legacy projects continue working without changes

Performance Considerations

With file_id (modern projects):

✅ Direct path access: No filesystem checks needed
✅ No cache lookups: Skip ProjectPathResolver cache
✅ Faster response: ~50-100ms saved per request

Without file_id (legacy or omitted):

⚠️ ProjectPathResolver overhead: Cache check + potential filesystem existence checks
⚠️ Acceptable performance: <10ms overhead for cached paths, <100ms for uncached

Recommendation: Frontend should pass file_id when available for optimal performance.

Backend Implementation

1. ProjectPathResolver

Purpose: Resolves file paths transparently across both legacy (pages/) and modern (files/{file_id}/pages/) structures, providing backward compatibility during migration.

Location: src/main/java/org/codetricks/construction/code/assistant/ProjectPathResolver.java

package org.codetricks.construction.code.assistant;

import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;

import java.io.IOException;
import java.util.Optional;
import java.util.concurrent.TimeUnit;
import java.util.logging.Logger;

/**
 * Resolves file paths across both modern (files/{file_id}/pages/) and
 * legacy (pages/) project structures with transparent fallback.
 * 
 * <p>Path Resolution Strategy:
 * 1. Check in-memory cache first (avoid repeated filesystem checks)
 * 2. Try modern structure: files/{file_id}/pages/{page_number}/
 * 3. Fall back to legacy: pages/{page_number}/
 * 4. Cache the result for future reads
 * 
 * <p>Thread-safe and optimized for read-heavy workloads.
 * 
 * <p>Instantiation: Create via constructor, pass to service implementations
 */
public class ProjectPathResolver {
  
  private static final Logger logger = Logger.getLogger(
      ProjectPathResolver.class.getName());
  
  private final FileSystemHandler fileSystemHandler;
  
  // Cache: projectId + pageNumber -> resolved path
  private final Cache<String, String> pathCache;
  
  public ProjectPathResolver(FileSystemHandler fileSystemHandler) {
    this.fileSystemHandler = fileSystemHandler;
    this.pathCache = CacheBuilder.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(1, TimeUnit.HOURS)
        .build();
  }
  
  /**
   * Resolves the page folder path with optional file ID for performance optimization.
   * 
   * <p><b>Path Resolution Strategy:</b>
   * <ul>
   *   <li>If fileId provided: Direct path construction (FAST - no filesystem checks)</li>
   *   <li>If fileId null/empty: Dual-read logic (check cache → modern → legacy)</li>
   * </ul>
   * 
   * @param projectId The unique identifier of the project
   * @param pageNumber The page number (1-based)
   * @param fileId Optional file ID for direct access (null or empty for auto-detect)
   * @return The resolved page folder path
   * @throws PageNotFoundException if page doesn't exist in either structure
   */
  public String resolvePageFolderPath(String projectId, int pageNumber, String fileId) 
      throws PageNotFoundException {
    
    // Fast path: If file ID is provided, construct path directly
    if (fileId != null && !fileId.isEmpty()) {
      return String.format("projects/%s/files/%s/pages/%03d", 
          projectId, fileId, pageNumber);
    }
    
    // Slow path: Auto-detect structure with caching
    String cacheKey = getCacheKey(projectId, pageNumber);
    
    // Check cache first
    String cachedPath = pathCache.getIfPresent(cacheKey);
    if (cachedPath != null) {
      return cachedPath;
    }
    
    // Try legacy structure only (modern projects MUST provide file_id)
    // Page numbers in modern projects are file-scoped, so we can't auto-detect
    try {
      String legacyPath = buildLegacyPageFolderPath(projectId, pageNumber);
      
      if (fileSystemHandler.exists(legacyPath)) {
        pathCache.put(cacheKey, legacyPath);
        logger.info(String.format(
            "Using legacy page path for project %s, page %d: %s", 
            projectId, pageNumber, legacyPath));
        return legacyPath;
      }
      
      // Not found in legacy structure
      throw new PageNotFoundException(projectId, pageNumber, 
          "Page not found in legacy structure. Modern projects require file_id parameter.");
          
    } catch (IOException e) {
      throw new PageNotFoundException(projectId, pageNumber, e);
    }
  }
  
  /**
   * Convenience overload for backward compatibility.
   * Delegates to main method with fileId = null.
   */
  public String resolvePageFolderPath(String projectId, int pageNumber) 
      throws PageNotFoundException {
    return resolvePageFolderPath(projectId, pageNumber, null);
  }
  
  /**
   * Detects the project structure version.
   * 
   * @param projectId The unique identifier of the project
   * @return The detected project structure version
   */
  public ProjectStructureVersion detectProjectVersion(String projectId) throws IOException {
    boolean hasLegacyPages = fileSystemHandler.exists(
        String.format("projects/%s/pages/", projectId));
    boolean hasFiles = fileSystemHandler.exists(
        String.format("projects/%s/files/", projectId));
    
    if (hasFiles && !hasLegacyPages) {
      return ProjectStructureVersion.MODERN;
    } else if (hasFiles && hasLegacyPages) {
      return ProjectStructureVersion.TRANSITIONAL;
    } else if (hasLegacyPages) {
      return ProjectStructureVersion.LEGACY;
    } else {
      return ProjectStructureVersion.UNKNOWN;
    }
  }
  
  /**
   * Clears the path cache for a specific project.
   * Call this after migration to ensure fresh path resolution.
   */
  public void clearCacheForProject(String projectId) {
    pathCache.invalidateAll();
    logger.info("Cleared path cache for project: " + projectId);
  }
  
  // Private helper methods
  
  private String getCacheKey(String projectId, int pageNumber) {
    return projectId + ":" + pageNumber;
  }
  
  public enum ProjectStructureVersion {
    UNKNOWN,
    LEGACY,       // Only has pages/
    TRANSITIONAL, // Has both pages/ and files/
    MODERN        // Only has files/
  }
  
  public static class PageNotFoundException extends Exception {
    public PageNotFoundException(String projectId, int pageNumber) {
      super(String.format("Page %d not found in project %s", pageNumber, projectId));
    }
  }
}

2. InputFileMetadataService

Purpose: Generate, retrieve, and manage file metadata

Location: src/main/java/org/codetricks/construction/code/assistant/service/InputFileMetadataService.java

package org.codetricks.construction.code.assistant.service;

import com.google.protobuf.util.JsonFormat;
import org.codetricks.construction.code.assistant.FileSystemHandler;

import java.io.IOException;
import java.security.MessageDigest;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Logger;

/**
 * Service for managing Project Input File metadata and the hierarchical file structure.
 * 
 * <p>Handles metadata generation, persistence, and retrieval for user-uploaded input files
 * (PDFs, images, etc.) that get processed into individual pages.
 * 
 * <p><b>GCS Path Structure:</b>
 * <ul>
 *   <li>Input files: {@code projects/{projectId}/inputs/filename.pdf}</li>
 *   <li>File metadata: {@code projects/{projectId}/files/{file_id}/metadata.json}</li>
 *   <li>Extracted pages: {@code projects/{projectId}/files/{file_id}/pages/{page_num}/}</li>
 *   <li>Legacy pages: {@code projects/{projectId}/pages/{page_num}/} (backward compatibility)</li>
 *   <li>File index: {@code projects/{projectId}/files/index.json} (file ID counter + mappings)</li>
 * </ul>
 * 
 * <p><b>Responsibilities:</b>
 * <ul>
 *   <li>Generate rich metadata (InputFileMetadata proto) for uploaded files</li>
 *   <li>Assign auto-incrementing file IDs via {@code files/index.json}</li>
 *   <li>Classify document types (plans, specifications, reports, etc.)</li>
 *   <li>Persist and retrieve metadata from GCS</li>
 *   <li>Track processing status and page associations</li>
 * </ul>
 * 
 * <p>Thread-safe and idempotent.
 * 
 * <p>Instantiation: Create via constructor, pass FileSystemHandler and DocumentClassificationService
 */
public class InputFileMetadataService {
  
  private static final Logger logger = Logger.getLogger(
      InputFileMetadataService.class.getName());
  
  private final FileSystemHandler fileSystemHandler;
  private final DocumentClassificationService classificationService;
  
  public InputFileMetadataService(
      FileSystemHandler fileSystemHandler,
      DocumentClassificationService classificationService) {
    this.fileSystemHandler = fileSystemHandler;
    this.classificationService = classificationService;
  }
  
  /**
   * Generates comprehensive metadata for an input file.
   * 
   * @param projectId The project ID
   * @param inputFilePath Path to the input file (e.g., "inputs/plans.pdf")
   * @param forceRegenerate Whether to overwrite existing metadata
   * @return Generated metadata
   */
  public InputFileMetadata generateMetadata(
      String projectId,
      String inputFilePath,
      boolean forceRegenerate) throws IOException {
    
    String fullPath = String.format("projects/%s/%s", projectId, inputFilePath);
    
    // Check if metadata already exists
    String fileId = extractOrGenerateFileId(projectId, inputFilePath);
    String metadataPath = getMetadataPath(projectId, fileId);
    
    if (!forceRegenerate && fileSystemHandler.exists(metadataPath)) {
      logger.info("Metadata already exists for file: " + inputFilePath);
      return loadMetadata(projectId, fileId);
    }
    
    logger.info("Generating metadata for file: " + inputFilePath);
    
    // Build metadata
    InputFileMetadata.Builder builder = InputFileMetadata.newBuilder()
        .setFileId(fileId)
        .setFileName(extractFileName(inputFilePath))
        .setFilePath(inputFilePath)
        .setMimeType(detectMimeType(fullPath))
        .setFileSizeBytes(fileSystemHandler.getFileSize(fullPath))
        .setUploadDate(com.google.protobuf.Timestamp.newBuilder()
            .setSeconds(Instant.now().getEpochSecond())
            .build())
        .setProcessingStatus(ProcessingStatus.PROCESSING_STATUS_UPLOADED);
    
    // For PDF files, extract page count
    if (fullPath.endsWith(".pdf")) {
      int pageCount = extractPageCount(fullPath);
      builder.setPageCount(pageCount);
    }
    
    // Classify document type (heuristic-based for now, AI later)
    DocumentType docType = classificationService.classifyDocument(
        builder.getFileName(), fullPath);
    builder.setDocumentType(docType);
    
    // Generate checksum
    String checksum = generateChecksum(fullPath);
    builder.setChecksumMd5(checksum);
    
    InputFileMetadata metadata = builder.build();
    
    // Save metadata to disk
    saveMetadata(projectId, fileId, metadata);
    
    logger.info("Generated metadata for file: " + fileId);
    return metadata;
  }
  
  /**
   * Loads existing metadata from disk.
   */
  public InputFileMetadata loadMetadata(String projectId, String fileId) 
      throws IOException {
    String metadataPath = getMetadataPath(projectId, fileId);
    String metadataJson = fileSystemHandler.readFileAsString(metadataPath);
    
    InputFileMetadata.Builder builder = InputFileMetadata.newBuilder();
    JsonFormat.parser().merge(metadataJson, builder);
    return builder.build();
  }
  
  /**
   * Updates metadata with processing results.
   */
  public InputFileMetadata updateProcessingStatus(
      String projectId,
      String fileId,
      ProcessingStatus status,
      List<String> extractedPages) throws IOException {
    
    InputFileMetadata existing = loadMetadata(projectId, fileId);
    
    InputFileMetadata.Builder builder = existing.toBuilder()
        .setProcessingStatus(status)
        .setProcessedDate(com.google.protobuf.Timestamp.newBuilder()
            .setSeconds(Instant.now().getEpochSecond())
            .build())
        .clearExtractedPages()
        .addAllExtractedPages(extractedPages);
    
    InputFileMetadata updated = builder.build();
    saveMetadata(projectId, fileId, updated);
    
    return updated;
  }
  
  /**
   * Lists all file metadata in a project.
   */
  public List<InputFileMetadata> listAllMetadata(String projectId) 
      throws IOException {
    String filesBasePath = String.format("projects/%s/files/", projectId);
    
    if (!fileSystemHandler.exists(filesBasePath)) {
      return new ArrayList<>();
    }
    
    List<String> fileIds = fileSystemHandler.listDirectories(filesBasePath);
    List<InputFileMetadata> metadataList = new ArrayList<>();
    
    for (String fileId : fileIds) {
      try {
        InputFileMetadata metadata = loadMetadata(projectId, fileId);
        metadataList.add(metadata);
      } catch (IOException e) {
        logger.warning("Failed to load metadata for file: " + fileId);
      }
    }
    
    return metadataList;
  }
  
  // Private helper methods
  
  /**
   * Generates a new file ID using auto-increment counter.
   * Maintains counter in projects/{projectId}/files/index.json
   */
  private String extractOrGenerateFileId(String projectId, String inputFilePath) 
      throws IOException {
    String indexPath = String.format("projects/%s/files/index.json", projectId);
    
    // Load index or create new one
    FileIndex index;
    if (fileSystemHandler.exists(indexPath)) {
      String indexJson = fileSystemHandler.readFileAsString(indexPath);
      index = new Gson().fromJson(indexJson, FileIndex.class);
    } else {
      index = new FileIndex();
      index.nextFileId = 1;
      index.files = new ArrayList<>();
    }
    
    // Generate new file ID
    String fileId = String.valueOf(index.nextFileId);
    index.nextFileId++;
    
    // Add to index
    index.files.add(new FileIndexEntry(fileId, extractFileName(inputFilePath)));
    
    // Save index
    String updatedJson = new Gson().toJson(index);
    fileSystemHandler.writeFile(indexPath, updatedJson);
    
    return fileId;
  }
  
  // Helper classes for file index
  private static class FileIndex {
    int nextFileId;
    List<FileIndexEntry> files;
  }
  
  private static class FileIndexEntry {
    String fileId;
    String fileName;
    
    FileIndexEntry(String fileId, String fileName) {
      this.fileId = fileId;
      this.fileName = fileName;
    }
  }
  
  private String getMetadataPath(String projectId, String fileId) {
    return String.format("projects/%s/files/%s/metadata.json", projectId, fileId);
  }
  
  private String extractFileName(String filePath) {
    int lastSlash = filePath.lastIndexOf('/');
    return lastSlash >= 0 ? filePath.substring(lastSlash + 1) : filePath;
  }
  
  private String detectMimeType(String filePath) {
    if (filePath.endsWith(".pdf")) {
      return "application/pdf";
    }
    return "application/octet-stream";
  }
  
  private int extractPageCount(String pdfPath) throws IOException {
    // Use Apache PDFBox to get page count
    try (org.apache.pdfbox.pdmodel.PDDocument document = 
        org.apache.pdfbox.pdmodel.PDDocument.load(
            fileSystemHandler.readFileAsBytes(pdfPath))) {
      return document.getNumberOfPages();
    }
  }
  
  private String generateChecksum(String filePath) throws IOException {
    try {
      MessageDigest md = MessageDigest.getInstance("MD5");
      byte[] fileBytes = fileSystemHandler.readFileAsBytes(filePath);
      byte[] hashBytes = md.digest(fileBytes);
      
      StringBuilder sb = new StringBuilder();
      for (byte b : hashBytes) {
        sb.append(String.format("%02x", b));
      }
      return sb.toString();
    } catch (Exception e) {
      logger.warning("Failed to generate checksum: " + e.getMessage());
      return "";
    }
  }
  
  private void saveMetadata(
      String projectId, 
      String fileId, 
      InputFileMetadata metadata) throws IOException {
    String metadataPath = getMetadataPath(projectId, fileId);
    String metadataJson = JsonFormat.printer()
        .preservingProtoFieldNames()
        .print(metadata);
    fileSystemHandler.writeFile(metadataPath, metadataJson);
  }
}

3. Migration Readiness Assessment

Purpose: Determine if a project is safe to migrate and provide actionable recommendations

Readiness Status Values

Status	Condition	Can Migrate?	Description
`READY`	Legacy project, has input files and pages, no issues	✅ Yes	Ideal state for migration
`READY_WITH_WARNINGS`	Legacy project, has pages, but minor issues (e.g., no input files)	✅ Yes	Can proceed but review warnings
`NOT_READY`	Legacy project but critical blockers (e.g., no pages)	❌ No	Fix issues before migrating
`ALREADY_MIGRATED`	Modern structure detected	N/A	No action needed
`MIGRATION_IN_PROGRESS`	Transitional state (both structures exist)	⚠️ Caution	Likely interrupted migration

Assessment Logic

// Pseudo-code for readiness assessment
if (currentVersion == MODERN) {
  return "ALREADY_MIGRATED";
} else if (currentVersion == TRANSITIONAL) {
  return "MIGRATION_IN_PROGRESS";  // Investigate before retrying
} else if (legacyPageCount == 0) {
  return "NOT_READY";  // Nothing to migrate
} else if (inputFileCount == 0) {
  return "READY_WITH_WARNINGS";  // Will create default file entry
} else {
  return "READY";  // Green light!
}

Common Issues and Resolutions

Issue	Severity	Resolution
No input files in `inputs/`	Warning	Create default file entry with ID `unknown-source`
No pages in `pages/`	Blocker	Cannot migrate empty project
Page numbering gaps	Warning	Proceed anyway, gaps will be preserved
Both structures exist	Warning	Likely interrupted migration, investigate before retrying
Insufficient storage	Blocker	Free up space or increase quota

4. FileStructureMigrationService

Purpose: Migrate legacy projects to new structure

Location: src/main/java/org/codetricks/construction/code/assistant/service/FileStructureMigrationService.java

package org.codetricks.construction.code.assistant.service;

import java.io.IOException;
import java.util.*;
import java.util.logging.Logger;

/**
 * Service for migrating legacy projects from flat pages/ structure to
 * hierarchical files/{file_id}/pages/ structure.
 * 
 * <p>Migration Strategy:
 * 1. Analyze inputs/ folder to identify source files
 * 2. Generate file metadata for each input file
 * 3. Associate existing pages with source files (best effort heuristic)
 * 4. Copy pages to new files/{file_id}/pages/ structure
 * 5. Keep legacy pages/ intact for rollback
 * 6. Update plan-metadata.json with new paths
 * 
 * <p>Thread-safe and idempotent.
 * 
 * <p>Instantiation: Create via constructor, pass FileSystemHandler, 
 * InputFileMetadataService, and ProjectPathResolver
 */
public class FileStructureMigrationService {
  
  private static final Logger logger = Logger.getLogger(
      FileStructureMigrationService.class.getName());
  
  private final FileSystemHandler fileSystemHandler;
  private final InputFileMetadataService metadataService;
  private final ProjectPathResolver pathResolver;
  
  public FileStructureMigrationService(
      FileSystemHandler fileSystemHandler,
      InputFileMetadataService metadataService,
      ProjectPathResolver pathResolver) {
    this.fileSystemHandler = fileSystemHandler;
    this.metadataService = metadataService;
    this.pathResolver = pathResolver;
  }
  
  /**
   * Analyzes a project to determine migration readiness.
   * Provides comprehensive assessment including blockers, estimates, and recommendations.
   */
  public MigrationAnalysis analyzeProject(String projectId) throws IOException {
    logger.info("Analyzing project for migration: " + projectId);
    
    MigrationAnalysis analysis = new MigrationAnalysis();
    analysis.projectId = projectId;
    analysis.currentVersion = dualReadHandler.detectProjectVersion(projectId);
    analysis.issues = new ArrayList<>();
    
    // Count input files
    String inputsPath = String.format("projects/%s/inputs/", projectId);
    if (fileSystemHandler.exists(inputsPath)) {
      analysis.inputFileCount = fileSystemHandler.listFiles(inputsPath).size();
    }
    
    // Count legacy pages
    String pagesPath = String.format("projects/%s/pages/", projectId);
    if (fileSystemHandler.exists(pagesPath)) {
      analysis.legacyPageCount = fileSystemHandler.listDirectories(pagesPath).size();
    }
    
    // Determine if migration is needed
    analysis.needsMigration = (analysis.currentVersion == 
        ProjectPathResolver.ProjectStructureVersion.LEGACY);
    
    // Estimate duration (rough estimate: 1 second per page + 5 seconds per file)
    analysis.estimatedDurationSeconds = 
        (analysis.legacyPageCount * 1) + (analysis.inputFileCount * 5);
    
    // Check for blockers and warnings
    if (analysis.inputFileCount == 0) {
      analysis.issues.add("No input files found in inputs/ folder - will create default file entry");
    }
    
    if (analysis.legacyPageCount == 0) {
      analysis.issues.add("No pages found in pages/ folder - nothing to migrate");
    }
    
    // Assess readiness status
    if (analysis.currentVersion == ProjectPathResolver.ProjectStructureVersion.MODERN) {
      analysis.readinessStatus = "ALREADY_MIGRATED";
      analysis.readinessMessage = "Project has already been migrated to the new file structure.";
    } else if (analysis.currentVersion == ProjectPathResolver.ProjectStructureVersion.TRANSITIONAL) {
      analysis.readinessStatus = "MIGRATION_IN_PROGRESS";
      analysis.readinessMessage = "Project migration is in progress or partially complete. " +
          "Both legacy and modern structures exist.";
    } else if (!analysis.issues.isEmpty() && analysis.legacyPageCount == 0) {
      analysis.readinessStatus = "NOT_READY";
      analysis.readinessMessage = "Project cannot be migrated: no pages found.";
    } else if (!analysis.issues.isEmpty()) {
      analysis.readinessStatus = "READY_WITH_WARNINGS";
      analysis.readinessMessage = String.format(
          "Project can be migrated with warnings. Found %d input files and %d pages. " +
          "Estimated time: %d seconds. Issues: %s",
          analysis.inputFileCount, analysis.legacyPageCount, 
          analysis.estimatedDurationSeconds, String.join("; ", analysis.issues));
    } else {
      analysis.readinessStatus = "READY";
      analysis.readinessMessage = String.format(
          "Project is ready to migrate. Found %d input files and %d pages. " +
          "Estimated time: %d seconds. No blockers detected.",
          analysis.inputFileCount, analysis.legacyPageCount, 
          analysis.estimatedDurationSeconds);
    }
    
    logger.info("Analysis complete: " + analysis);
    return analysis;
  }
  
  /**
   * Migrates a project to new file structure.
   * 
   * @param projectId The project to migrate
   * @param preserveLegacy Whether to keep pages/ folder after migration
   * @param dryRun If true, only preview changes without applying
   * @return Migration result
   */
  public MigrationResult migrateProject(
      String projectId,
      boolean preserveLegacy,
      boolean dryRun) throws IOException {
    
    logger.info(String.format(
        "Starting migration for project %s (dryRun=%s, preserve=%s)",
        projectId, dryRun, preserveLegacy));
    
    MigrationResult result = new MigrationResult();
    result.projectId = projectId;
    result.startTime = System.currentTimeMillis();
    
    try {
      // Step 1: Analyze input files
      List<String> inputFiles = discoverInputFiles(projectId);
      logger.info("Discovered " + inputFiles.size() + " input files");
      
      if (inputFiles.isEmpty()) {
        // No input files - create a default file for all pages
        inputFiles.add(createDefaultFileEntry(projectId));
      }
      
      // Step 2: Generate metadata for each input file
      Map<String, InputFileMetadata> fileMetadataMap = new HashMap<>();
      for (String inputFile : inputFiles) {
        InputFileMetadata metadata = metadataService.generateMetadata(
            projectId, inputFile, false /* don't force regenerate */);
        fileMetadataMap.put(metadata.getFileId(), metadata);
        result.migratedFiles.add(metadata);
      }
      
      // Step 3: Associate pages with files (heuristic-based)
      Map<String, List<Integer>> fileToPages = associatePagesWithFiles(
          projectId, fileMetadataMap);
      
      // Step 4: Migrate pages to new structure
      for (Map.Entry<String, List<Integer>> entry : fileToPages.entrySet()) {
        String fileId = entry.getKey();
        List<Integer> pageNumbers = entry.getValue();
        
        for (int pageNumber : pageNumbers) {
          if (!dryRun) {
            migratePageToNewStructure(projectId, fileId, pageNumber);
          }
          result.totalPagesMigrated++;
        }
        
        result.pagesPerFile.put(fileId, pageNumbers.size());
      }
      
      // Step 5: Update metadata with extracted pages
      if (!dryRun) {
        for (Map.Entry<String, List<Integer>> entry : fileToPages.entrySet()) {
          String fileId = entry.getKey();
          List<String> pageIds = entry.getValue().stream()
              .map(String::valueOf)
              .toList();
          metadataService.updateProcessingStatus(
              projectId, fileId, ProcessingStatus.PROCESSING_STATUS_COMPLETED, pageIds);
        }
      }
      
      // Step 6: Optionally remove legacy pages/ folder
      if (!preserveLegacy && !dryRun) {
        String legacyPagesPath = String.format("projects/%s/pages/", projectId);
        fileSystemHandler.deleteDirectory(legacyPagesPath);
        logger.info("Removed legacy pages/ folder");
      }
      
      // Step 7: Clear path cache to force re-resolution
      if (!dryRun) {
        dualReadHandler.clearCacheForProject(projectId);
      }
      
      result.success = true;
      logger.info("Migration completed successfully");
      
    } catch (Exception e) {
      result.success = false;
      result.errorMessage = e.getMessage();
      logger.severe("Migration failed: " + e.getMessage());
      e.printStackTrace();
    }
    
    result.endTime = System.currentTimeMillis();
    return result;
  }
  
  // Private helper methods
  
  private List<String> discoverInputFiles(String projectId) throws IOException {
    String inputsPath = String.format("projects/%s/inputs/", projectId);
    
    if (!fileSystemHandler.exists(inputsPath)) {
      return new ArrayList<>();
    }
    
    return fileSystemHandler.listFiles(inputsPath).stream()
        .map(filename -> "inputs/" + filename)
        .toList();
  }
  
  private String createDefaultFileEntry(String projectId) {
    // For projects with no input files, create a placeholder
    return "inputs/unknown-source.pdf";
  }
  
  private Map<String, List<Integer>> associatePagesWithFiles(
      String projectId,
      Map<String, InputFileMetadata> fileMetadataMap) throws IOException {
    
    // Simple heuristic: Distribute pages evenly across files based on page count
    Map<String, List<Integer>> fileToPages = new HashMap<>();
    
    // Get list of all legacy pages
    String legacyPagesPath = String.format("projects/%s/pages/", projectId);
    List<Integer> allPages = fileSystemHandler.listDirectories(legacyPagesPath).stream()
        .map(Integer::parseInt)
        .sorted()
        .toList();
    
    if (allPages.isEmpty()) {
      return fileToPages;
    }
    
    // If only one file, assign all pages to it
    if (fileMetadataMap.size() == 1) {
      String fileId = fileMetadataMap.keySet().iterator().next();
      fileToPages.put(fileId, new ArrayList<>(allPages));
      return fileToPages;
    }
    
    // Otherwise, distribute based on page count in each file
    List<InputFileMetadata> sortedFiles = fileMetadataMap.values().stream()
        .sorted(Comparator.comparingInt(InputFileMetadata::getPageCount))
        .toList();
    
    int currentPageIndex = 0;
    for (InputFileMetadata metadata : sortedFiles) {
      int pageCount = metadata.getPageCount();
      List<Integer> assignedPages = new ArrayList<>();
      
      for (int i = 0; i < pageCount && currentPageIndex < allPages.size(); i++) {
        assignedPages.add(allPages.get(currentPageIndex++));
      }
      
      fileToPages.put(metadata.getFileId(), assignedPages);
    }
    
    // Assign remaining pages to last file (edge case)
    if (currentPageIndex < allPages.size()) {
      String lastFileId = sortedFiles.get(sortedFiles.size() - 1).getFileId();
      List<Integer> lastFilePages = fileToPages.get(lastFileId);
      while (currentPageIndex < allPages.size()) {
        lastFilePages.add(allPages.get(currentPageIndex++));
      }
    }
    
    return fileToPages;
  }
  
  private void migratePageToNewStructure(
      String projectId,
      String fileId,
      int pageNumber) throws IOException {
    
    String legacyPath = String.format("projects/%s/pages/%03d/", projectId, pageNumber);
    String newPath = String.format("projects/%s/files/%s/pages/%03d/", 
        projectId, fileId, pageNumber);
    
    // Copy entire page folder to new location
    fileSystemHandler.copyDirectory(legacyPath, newPath);
    
    logger.info(String.format("Migrated page %d to file %s", pageNumber, fileId));
  }
  
  // Data classes
  
  public static class MigrationAnalysis {
    public String projectId;
    public ProjectPathResolver.ProjectStructureVersion currentVersion;
    public boolean needsMigration;
    public int inputFileCount;
    public int legacyPageCount;
    public int estimatedDurationSeconds;
    public List<String> issues;
    public String readinessStatus;    // READY, READY_WITH_WARNINGS, NOT_READY, ALREADY_MIGRATED
    public String readinessMessage;
    
    @Override
    public String toString() {
      return String.format(
          "MigrationAnalysis{project=%s, version=%s, readiness=%s, " +
          "input_files=%d, pages=%d, est_duration=%ds, issues=%d}",
          projectId, currentVersion, readinessStatus, 
          inputFileCount, legacyPageCount, estimatedDurationSeconds, 
          issues != null ? issues.size() : 0);
    }
  }
  
  public static class MigrationResult {
    public String projectId;
    public boolean success;
    public List<InputFileMetadata> migratedFiles = new ArrayList<>();
    public Map<String, Integer> pagesPerFile = new HashMap<>();
    public int totalPagesMigrated;
    public String errorMessage;
    public long startTime;
    public long endTime;
    
    public long getDurationSeconds() {
      return (endTime - startTime) / 1000;
    }
  }
}

5. DocumentClassificationService

Purpose: Classify document type (heuristic-based initially)

Location: src/main/java/org/codetricks/construction/code/assistant/service/DocumentClassificationService.java

package org.codetricks.construction.code.assistant.service;

import java.util.regex.Pattern;

/**
 * Service for classifying document types based on filename and content.
 * Uses heuristic rules initially, can be enhanced with AI classification later.
 * 
 * <p>Instantiation: Stateless utility, can be instantiated with default constructor
 */
public class DocumentClassificationService {
  
  // Patterns for document type detection
  private static final Pattern ARCHITECTURAL_PATTERN = Pattern.compile(
      "(?i).*(architectural|arch|floor[\\s-]?plan|site[\\s-]?plan|elevation|section).*");
  
  private static final Pattern ELECTRICAL_PATTERN = Pattern.compile(
      "(?i).*(electrical|elec|power|lighting).*");
  
  private static final Pattern MECHANICAL_PATTERN = Pattern.compile(
      "(?i).*(mechanical|mech|hvac|plumbing|mep).*");
  
  private static final Pattern STRUCTURAL_PATTERN = Pattern.compile(
      "(?i).*(structural|struct|foundation|framing).*");
  
  private static final Pattern PERMIT_PATTERN = Pattern.compile(
      "(?i).*(permit|application|approval).*");
  
  private static final Pattern INSPECTION_PATTERN = Pattern.compile(
      "(?i).*(inspector|inspection|feedback|corrections).*");
  
  /**
   * Classifies a document based on filename and optional content analysis.
   * 
   * @param filename The name of the file
   * @param filePath Optional path to file for content analysis (future enhancement)
   * @return The classified document type
   */
  public DocumentType classifyDocument(String filename, String filePath) {
    // Heuristic-based classification using filename patterns
    
    if (ARCHITECTURAL_PATTERN.matcher(filename).matches()) {
      return DocumentType.DOCUMENT_TYPE_ARCHITECTURAL_PLAN;
    }
    
    if (ELECTRICAL_PATTERN.matcher(filename).matches()) {
      return DocumentType.DOCUMENT_TYPE_ELECTRICAL_PLAN;
    }
    
    if (MECHANICAL_PATTERN.matcher(filename).matches()) {
      return DocumentType.DOCUMENT_TYPE_MECHANICAL_PLAN;
    }
    
    if (STRUCTURAL_PATTERN.matcher(filename).matches()) {
      return DocumentType.DOCUMENT_TYPE_STRUCTURAL_PLAN;
    }
    
    if (PERMIT_PATTERN.matcher(filename).matches()) {
      return DocumentType.DOCUMENT_TYPE_PERMIT_APPLICATION;
    }
    
    if (INSPECTION_PATTERN.matcher(filename).matches()) {
      return DocumentType.DOCUMENT_TYPE_INSPECTOR_FEEDBACK;
    }
    
    // Default to unknown if no pattern matches
    return DocumentType.DOCUMENT_TYPE_UNKNOWN;
  }
  
  // Future enhancement: AI-based classification using LLM
  public DocumentType classifyDocumentWithAI(String filePath) {
    // TODO: Implement LLM-based classification
    // 1. Extract first page or sample of content
    // 2. Call LLM with prompt: "Classify this construction document..."
    // 3. Parse LLM response to DocumentType enum
    // 4. Fall back to heuristic if LLM fails
    throw new UnsupportedOperationException("AI classification not yet implemented");
  }
}

Frontend Implementation

1. File Metadata List Component

Purpose: Display list of files with rich metadata

Location: web-ng-m3/src/app/components/project/settings/file-metadata-list/file-metadata-list.component.ts

import { Component, Input, OnInit } from '@angular/core';
import { InputFileMetadata, DocumentType, ProcessingStatus } from '@generated/api_pb';
import { ArchitecturalPlanService } from '@app/services/architectural-plan.service';

@Component({
  selector: 'app-file-metadata-list',
  templateUrl: './file-metadata-list.component.html',
  styleUrls: ['./file-metadata-list.component.scss']
})
export class FileMetadataListComponent implements OnInit {
  @Input() projectId: string = '';
  
  files: InputFileMetadata[] = [];
  loading: boolean = true;
  error: string | null = null;
  
  // Enum references for template
  DocumentType = DocumentType;
  ProcessingStatus = ProcessingStatus;
  
  constructor(private planService: ArchitecturalPlanService) {}
  
  ngOnInit(): void {
    this.loadFileMetadata();
  }
  
  private loadFileMetadata(): void {
    this.loading = true;
    this.error = null;
    
    this.planService.listInputFileMetadata(this.projectId)
      .subscribe({
        next: (response) => {
          this.files = response.files;
          this.loading = false;
        },
        error: (err) => {
          this.error = 'Failed to load file metadata';
          this.loading = false;
          console.error('Error loading file metadata:', err);
        }
      });
  }
  
  getDocumentTypeLabel(type: DocumentType): string {
    switch (type) {
      case DocumentType.DOCUMENT_TYPE_ARCHITECTURAL_PLAN:
        return 'Architectural Plan';
      case DocumentType.DOCUMENT_TYPE_ELECTRICAL_PLAN:
        return 'Electrical Plan';
      case DocumentType.DOCUMENT_TYPE_MECHANICAL_PLAN:
        return 'Mechanical Plan';
      case DocumentType.DOCUMENT_TYPE_STRUCTURAL_PLAN:
        return 'Structural Plan';
      case DocumentType.DOCUMENT_TYPE_INSPECTOR_FEEDBACK:
        return 'Inspector Feedback';
      case DocumentType.DOCUMENT_TYPE_PERMIT_APPLICATION:
        return 'Permit Application';
      case DocumentType.DOCUMENT_TYPE_SITE_PLAN:
        return 'Site Plan';
      case DocumentType.DOCUMENT_TYPE_ELEVATION_DRAWING:
        return 'Elevation Drawing';
      case DocumentType.DOCUMENT_TYPE_SECTION_DRAWING:
        return 'Section Drawing';
      default:
        return 'Unknown';
    }
  }
  
  getProcessingStatusLabel(status: ProcessingStatus): string {
    switch (status) {
      case ProcessingStatus.PROCESSING_STATUS_UPLOADED:
        return 'Uploaded';
      case ProcessingStatus.PROCESSING_STATUS_PROCESSING:
        return 'Processing';
      case ProcessingStatus.PROCESSING_STATUS_COMPLETED:
        return 'Completed';
      case ProcessingStatus.PROCESSING_STATUS_FAILED:
        return 'Failed';
      default:
        return 'Unknown';
    }
  }
  
  getProcessingStatusColor(status: ProcessingStatus): string {
    switch (status) {
      case ProcessingStatus.PROCESSING_STATUS_COMPLETED:
        return 'success';
      case ProcessingStatus.PROCESSING_STATUS_PROCESSING:
        return 'primary';
      case ProcessingStatus.PROCESSING_STATUS_FAILED:
        return 'warn';
      default:
        return 'accent';
    }
  }
  
  formatFileSize(bytes: number): string {
    if (bytes === 0) return '0 B';
    const k = 1024;
    const sizes = ['B', 'KB', 'MB', 'GB'];
    const i = Math.floor(Math.log(bytes) / Math.log(k));
    return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
  }
  
  formatDate(isoDate: string): string {
    if (!isoDate) return 'N/A';
    return new Date(isoDate).toLocaleDateString();
  }
}

Template: file-metadata-list.component.html

<div class="file-metadata-list">
  <h3>Project Files</h3>
  
  <div *ngIf="loading" class="loading-spinner">
    <mat-spinner diameter="40"></mat-spinner>
    <p>Loading file metadata...</p>
  </div>
  
  <div *ngIf="error" class="error-message">
    <mat-icon>error</mat-icon>
    <p>{{ error }}</p>
  </div>
  
  <div *ngIf="!loading && !error && files.length === 0" class="empty-state">
    <mat-icon>description</mat-icon>
    <p>No files found in this project.</p>
  </div>
  
  <mat-card *ngFor="let file of files" class="file-card">
    <mat-card-header>
      <mat-icon mat-card-avatar>description</mat-icon>
      <mat-card-title>{{ file.fileName }}</mat-card-title>
      <mat-card-subtitle>{{ getDocumentTypeLabel(file.documentType) }}</mat-card-subtitle>
    </mat-card-header>
    
    <mat-card-content>
      <div class="file-details">
        <div class="detail-row">
          <span class="label">File ID:</span>
          <span class="value">{{ file.fileId }}</span>
        </div>
        
        <div class="detail-row">
          <span class="label">Size:</span>
          <span class="value">{{ formatFileSize(file.fileSizeBytes) }}</span>
        </div>
        
        <div class="detail-row">
          <span class="label">Pages:</span>
          <span class="value">{{ file.pageCount }}</span>
        </div>
        
        <div class="detail-row">
          <span class="label">Upload Date:</span>
          <span class="value">{{ formatDate(file.uploadDate) }}</span>
        </div>
        
        <div class="detail-row">
          <span class="label">Status:</span>
          <mat-chip [color]="getProcessingStatusColor(file.processingStatus)" selected>
            {{ getProcessingStatusLabel(file.processingStatus) }}
          </mat-chip>
        </div>
        
        <div *ngIf="file.contentSummary" class="detail-row">
          <span class="label">Summary:</span>
          <p class="summary">{{ file.contentSummary }}</p>
        </div>
      </div>
    </mat-card-content>
    
    <mat-card-actions align="end">
      <button mat-button color="primary">View Pages</button>
      <button mat-button>Reprocess</button>
    </mat-card-actions>
  </mat-card>
</div>

Purpose: Display pages organized hierarchically by source file in table of contents

Location: web-ng-m3/src/app/components/project/pages/page-toc-hierarchical/page-toc-hierarchical.component.ts

UI Pattern: Mirrors the compliance tab's collapsible section hierarchy

import { Component, Input, Output, EventEmitter, OnInit } from '@angular/core';
import { InputFileMetadata, DocumentType } from '@generated/api_pb';
import { ArchitecturalPlanService } from '@app/services/architectural-plan.service';
import { trigger, state, style, transition, animate } from '@angular/animations';

export interface PageTreeNode {
  type: 'file' | 'page';
  fileId?: string;
  fileName?: string;
  documentType?: DocumentType;
  pageCount?: number;
  pageNumber?: number;
  pageTitle?: string;
  depth: number;
  children?: PageTreeNode[];
}

@Component({
  selector: 'app-page-toc-hierarchical',
  templateUrl: './page-toc-hierarchical.component.html',
  styleUrls: ['./page-toc-hierarchical.component.scss'],
  animations: [
    trigger('rowAnimation', [
      transition(':enter', [
        style({ height: '0px', opacity: 0, transform: 'translateY(-10px)', overflow: 'hidden' }),
        animate('150ms ease-out', style({ height: '*', opacity: 1, transform: 'translateY(0)' }))
      ]),
      transition(':leave', [
        animate('150ms ease-in', style({ height: '0px', opacity: 0, transform: 'translateY(-10px)', overflow: 'hidden' }))
      ])
    ])
  ]
})
export class PageTocHierarchicalComponent implements OnInit {
  @Input() projectId: string = '';
  @Input() selectedPageNumber: number | null = null;
  @Output() pageSelected = new EventEmitter<number>();
  
  treeNodes: PageTreeNode[] = [];
  expandedFileIds = new Set<string>();
  loading: boolean = true;
  
  constructor(private planService: ArchitecturalPlanService) {}
  
  ngOnInit(): void {
    this.loadHierarchy();
  }
  
  private async loadHierarchy(): Promise<void> {
    this.loading = true;
    
    try {
      // Load file metadata
      const response = await this.planService.listInputFileMetadata(this.projectId).toPromise();
      const files = response?.files || [];
      
      // Load plan pages
      const plan = await this.planService.getArchitecturalPlan(this.projectId).toPromise();
      const pages = plan?.pages || [];
      
      // Build tree structure
      this.treeNodes = files.map(file => ({
        type: 'file' as const,
        fileId: file.fileId,
        fileName: file.fileName,
        documentType: file.documentType,
        pageCount: file.pageCount,
        depth: 0,
        children: pages
          .filter(page => this.pagesBelongsToFile(page, file))
          .map(page => ({
            type: 'page' as const,
            pageNumber: page.pageNumber,
            pageTitle: page.title,
            depth: 1
          }))
      }));
      
      // Expand all by default
      files.forEach(file => this.expandedFileIds.add(file.fileId));
      
    } catch (error) {
      console.error('Error loading hierarchy:', error);
    } finally {
      this.loading = false;
    }
  }
  
  isExpanded(fileId: string): boolean {
    return this.expandedFileIds.has(fileId);
  }
  
  toggleFile(fileId: string): void {
    if (this.expandedFileIds.has(fileId)) {
      this.expandedFileIds.delete(fileId);
    } else {
      this.expandedFileIds.add(fileId);
    }
  }
  
  expandAll(): void {
    this.treeNodes.forEach(node => {
      if (node.fileId) {
        this.expandedFileIds.add(node.fileId);
      }
    });
  }
  
  collapseAll(): void {
    this.expandedFileIds.clear();
  }
  
  selectPage(pageNumber: number): void {
    this.selectedPageNumber = pageNumber;
    this.pageSelected.emit(pageNumber);
  }
  
  getNodePadding(depth: number): string {
    return `${depth * 24}px`;
  }
  
  getDocumentTypeIcon(type: DocumentType): string {
    switch (type) {
      case DocumentType.DOCUMENT_TYPE_ARCHITECTURAL_PLAN:
        return 'architecture';
      case DocumentType.DOCUMENT_TYPE_ELECTRICAL_PLAN:
        return 'electrical_services';
      case DocumentType.DOCUMENT_TYPE_MECHANICAL_PLAN:
        return 'hvac';
      case DocumentType.DOCUMENT_TYPE_STRUCTURAL_PLAN:
        return 'foundation';
      default:
        return 'description';
    }
  }
  
  private pagesBelongsToFile(page: any, file: InputFileMetadata): boolean {
    // For now, check if page number is in extracted_pages
    // This will be more sophisticated once migration is complete
    return file.extractedPages?.includes(page.pageNumber.toString()) || false;
  }
}

Template: page-toc-hierarchical.component.html

<div class="page-toc-hierarchical">
  <div class="toc-header">
    <h3>Table of Contents</h3>
    <div class="toc-actions">
      <button mat-icon-button (click)="expandAll()" title="Expand All">
        <mat-icon>unfold_more</mat-icon>
      </button>
      <button mat-icon-button (click)="collapseAll()" title="Collapse All">
        <mat-icon>unfold_less</mat-icon>
      </button>
    </div>
  </div>
  
  <div *ngIf="loading" class="loading-state">
    <mat-spinner diameter="30"></mat-spinner>
  </div>
  
  <div *ngIf="!loading" class="toc-tree">
    <ng-container *ngFor="let node of treeNodes">
      <!-- File Node (Parent) -->
      <div class="tree-node file-node" 
           [style.padding-left]="getNodePadding(node.depth)"
           (click)="toggleFile(node.fileId!)"
           [@rowAnimation]>
        <mat-icon class="expand-icon">
          {{ isExpanded(node.fileId!) ? 'expand_more' : 'chevron_right' }}
        </mat-icon>
        <mat-icon class="file-icon">{{ getDocumentTypeIcon(node.documentType!) }}</mat-icon>
        <span class="file-name">{{ node.fileName }}</span>
        <mat-chip class="document-type-chip" size="small">
          {{ getDocumentTypeLabel(node.documentType!) }}
        </mat-chip>
        <span class="page-count">{{ node.pageCount }} pages</span>
      </div>
      
      <!-- Page Nodes (Children) - only shown when file is expanded -->
      <ng-container *ngIf="isExpanded(node.fileId!)">
        <div *ngFor="let child of node.children" 
             class="tree-node page-node"
             [class.selected]="child.pageNumber === selectedPageNumber"
             [style.padding-left]="getNodePadding(child.depth)"
             (click)="selectPage(child.pageNumber!)"
             [@rowAnimation]>
          <span class="expander-placeholder"></span>
          <mat-icon class="page-icon">article</mat-icon>
          <span class="page-label">Page {{ child.pageNumber }}: {{ child.pageTitle }}</span>
        </div>
      </ng-container>
    </ng-container>
  </div>
</div>

Styling (similar to compliance tab):

.page-toc-hierarchical {
  .tree-node {
    display: flex;
    align-items: center;
    padding: 8px;
    cursor: pointer;
    transition: background-color 150ms ease;
    
    &:hover {
      background-color: rgba(0, 0, 0, 0.04);
    }
    
    &.selected {
      background-color: rgba(63, 81, 181, 0.1);
      border-left: 3px solid #3f51b5;
    }
  }
  
  .file-node {
    font-weight: 500;
    border-bottom: 1px solid rgba(0, 0, 0, 0.12);
  }
  
  .page-node {
    font-weight: 400;
  }
  
  .expand-icon {
    margin-right: 8px;
  }
  
  .expander-placeholder {
    display: inline-block;
    width: 32px;
  }
  
  .file-icon, .page-icon {
    margin-right: 8px;
    color: rgba(0, 0, 0, 0.54);
  }
}

Integration: This component replaces or enhances the existing flat page list in the TOC sidebar.

Purpose: Prompt users to upgrade legacy projects

Location: web-ng-m3/src/app/components/project/settings/legacy-upgrade-banner/legacy-upgrade-banner.component.ts

import { Component, Input, OnInit, Output, EventEmitter } from '@angular/core';
import { MatDialog } from '@angular/material/dialog';
import { ArchitecturalPlanService } from '@app/services/architectural-plan.service';
import { FileStructureMigrationDialogComponent } from './file-structure-migration-dialog.component';

@Component({
  selector: 'app-legacy-upgrade-banner',
  templateUrl: './legacy-upgrade-banner.component.html',
  styleUrls: ['./legacy-upgrade-banner.component.scss']
})
export class LegacyUpgradeBannerComponent implements OnInit {
  @Input() projectId: string = '';
  @Output() upgraded = new EventEmitter<void>();
  
  isLegacyProject: boolean = false;
  showBanner: boolean = false;
  checking: boolean = true;
  
  constructor(
    private planService: ArchitecturalPlanService,
    private dialog: MatDialog
  ) {}
  
  ngOnInit(): void {
    this.checkIfLegacyProject();
  }
  
  private checkIfLegacyProject(): void {
    this.checking = true;
    
    this.planService.analyzeProjectMigration(this.projectId)
      .subscribe({
        next: (analysis) => {
          this.isLegacyProject = analysis.needsMigration;
          this.showBanner = this.isLegacyProject && !this.isDismissed();
          this.checking = false;
        },
        error: (err) => {
          console.error('Error checking project version:', err);
          this.checking = false;
        }
      });
  }
  
  openUpgradeDialog(): void {
    const dialogRef = this.dialog.open(FileStructureMigrationDialogComponent, {
      width: '600px',
      data: { projectId: this.projectId }
    });
    
    dialogRef.afterClosed().subscribe(result => {
      if (result === 'upgraded') {
        this.showBanner = false;
        this.upgraded.emit();
      }
    });
  }
  
  dismissBanner(): void {
    this.showBanner = false;
    this.markAsDismissed();
  }
  
  private isDismissed(): boolean {
    const key = `legacy-upgrade-dismissed-${this.projectId}`;
    return localStorage.getItem(key) === 'true';
  }
  
  private markAsDismissed(): void {
    const key = `legacy-upgrade-dismissed-${this.projectId}`;
    localStorage.setItem(key, 'true');
  }
}

Template: legacy-upgrade-banner.component.html

<mat-card *ngIf="showBanner" class="legacy-upgrade-banner" appearance="outlined">
  <mat-card-content>
    <div class="banner-content">
      <mat-icon class="info-icon">info</mat-icon>
      <div class="banner-text">
        <h3>Upgrade Available</h3>
        <p>
          Upgrade your project to the new file structure for better organization, 
          rich file metadata, and improved search capabilities.
        </p>
      </div>
      <div class="banner-actions">
        <button mat-raised-button color="primary" (click)="openUpgradeDialog()">
          Upgrade Project
        </button>
        <button mat-button (click)="dismissBanner()">Dismiss</button>
      </div>
    </div>
  </mat-card-content>
</mat-card>

CLI Tools

Bulk Upgrade Command

Purpose: Admin tool for bulk upgrading legacy projects

Location: cli/codeproof.sh upgrade-file-structure

#!/bin/bash
# Bulk upgrade legacy projects to new file structure

set -e

# Configuration
GRPC_HOST="${GRPC_HOST:-localhost:8080}"
PROTO_PATH="src/main/proto"
GOOGLEAPIS_PATH="env/dependencies/googleapis"

# Parse arguments
DRY_RUN="false"
USER_ID=""
PROJECT_IDS=""
ALL="false"

while [[ $# -gt 0 ]]; do
  case $1 in
    --dry-run)
      DRY_RUN="$2"
      shift 2
      ;;
    --user-id)
      USER_ID="$2"
      shift 2
      ;;
    --project-ids)
      PROJECT_IDS="$2"
      shift 2
      ;;
    --all)
      ALL="true"
      shift
      ;;
    *)
      echo "Unknown option: $1"
      exit 1
      ;;
  esac
done

# Validate inputs
if [ -z "$USER_ID" ]; then
  echo "Error: --user-id is required"
  exit 1
fi

echo "=========================================="
echo "File Structure Bulk Upgrade Tool"
echo "=========================================="
echo "User ID: $USER_ID"
echo "Dry Run: $DRY_RUN"
echo "All Projects: $ALL"
echo ""

# Function to migrate a single project
migrate_project() {
  local project_id=$1
  
  echo "Migrating project: $project_id"
  
  RESPONSE=$(grpcurl -plaintext \
    -import-path "${PROTO_PATH}" \
    -import-path "${GOOGLEAPIS_PATH}" \
    -proto "${PROTO_PATH}/api.proto" \
    -d '{
      "project_id": "'"${project_id}"'",
      "preserve_legacy_structure": true,
      "dry_run": '"${DRY_RUN}"',
      "initiated_by": "'"${USER_ID}"'"
    }' \
    "${GRPC_HOST}" \
    org.codetricks.construction.code.assistant.service.ArchitecturalPlanService/MigrateProjectFileStructure)
  
  echo "$RESPONSE" | jq .
  
  SUCCESS=$(echo "$RESPONSE" | jq -r '.success')
  if [ "$SUCCESS" == "true" ]; then
    echo "✅ Successfully migrated: $project_id"
  else
    ERROR=$(echo "$RESPONSE" | jq -r '.error_message')
    echo "❌ Failed to migrate $project_id: $ERROR"
  fi
  
  echo ""
}

# Get list of projects to migrate
if [ "$ALL" == "true" ]; then
  echo "Fetching all projects for user..."
  LIST_RESPONSE=$(grpcurl -plaintext \
    -import-path "${PROTO_PATH}" \
    -import-path "${GOOGLEAPIS_PATH}" \
    -proto "${PROTO_PATH}/api.proto" \
    -d '{"account_id": "'"${USER_ID}"'"}' \
    "${GRPC_HOST}" \
    org.codetricks.construction.code.assistant.service.ArchitecturalPlanService/ListArchitecturalPlanIds)
  
  PROJECT_IDS=$(echo "$LIST_RESPONSE" | jq -r '.architectural_plan_ids[]')
fi

# Convert comma-separated to array if needed
IFS=',' read -ra PROJECTS <<< "$PROJECT_IDS"

# Migrate each project
TOTAL=${#PROJECTS[@]}
SUCCESS_COUNT=0
FAIL_COUNT=0

echo "Found $TOTAL projects to migrate"
echo ""

for project_id in "${PROJECTS[@]}"; do
  migrate_project "$project_id"
  
  # Check if successful
  if [ $? -eq 0 ]; then
    ((SUCCESS_COUNT++))
  else
    ((FAIL_COUNT++))
  fi
done

echo "=========================================="
echo "Migration Complete"
echo "=========================================="
echo "Total Projects: $TOTAL"
echo "Successful: $SUCCESS_COUNT"
echo "Failed: $FAIL_COUNT"
echo "=========================================="

Testing Strategy

1. Unit Tests

Test Coverage:

ProjectPathResolver: Path resolution logic, caching, fallback
InputFileMetadataService: Metadata generation, persistence
FileStructureMigrationService: Migration logic, page association
DocumentClassificationService: Heuristic classification rules

Example Test (ProjectPathResolverTest.java):

@Test
public void testResolvePagePath_NewStructureExists_ReturnsNewPath() {
  // Arrange
  String projectId = "test-project";
  int pageNumber = 1;
  String expectedPath = "projects/test-project/files/file-123/pages/001/";
  
  when(fileSystemHandler.exists(expectedPath)).thenReturn(true);
  
  // Act
  String actualPath = dualReadHandler.resolvePageFolderPath(projectId, pageNumber);
  
  // Assert
  assertEquals(expectedPath, actualPath);
  verify(fileSystemHandler).exists(expectedPath);
}

@Test
public void testResolvePagePath_LegacyFallback_ReturnsLegacyPath() {
  // Arrange
  String projectId = "legacy-project";
  int pageNumber = 1;
  String newPath = "projects/legacy-project/files/.../pages/001/";
  String legacyPath = "projects/legacy-project/pages/001/";
  
  when(fileSystemHandler.exists(newPath)).thenReturn(false);
  when(fileSystemHandler.exists(legacyPath)).thenReturn(true);
  
  // Act
  String actualPath = dualReadHandler.resolvePageFolderPath(projectId, pageNumber);
  
  // Assert
  assertEquals(legacyPath, actualPath);
}

2. Integration Tests

Test Scenarios:

Legacy Project Read: Verify existing functionality unchanged
Modern Project Read: Verify new structure works
Migration End-to-End: Migrate project, verify pages accessible
Rollback Safety: Ensure legacy structure preserved

Example Test (FileStructureMigrationIntegrationTest.java):

@Test
public void testMigrateProject_LegacyToModern_Success() {
  // Arrange: Create legacy project
  String projectId = createLegacyTestProject();
  
  // Act: Migrate project
  MigrationResult result = migrationService.migrateProject(
      projectId, true /* preserve legacy */, false /* not dry run */);
  
  // Assert: Migration successful
  assertTrue(result.success);
  assertEquals(3, result.totalPagesMigrated);
  assertTrue(result.migratedFiles.size() > 0);
  
  // Assert: Pages readable from new structure
  for (int i = 1; i <= 3; i++) {
    String path = dualReadHandler.resolvePageFolderPath(projectId, i);
    assertTrue(path.contains("/files/"));
  }
  
  // Assert: Legacy structure still exists
  String legacyPath = "projects/" + projectId + "/pages/";
  assertTrue(fileSystemHandler.exists(legacyPath));
}

3. Backward Compatibility Tests

Critical Tests:

Legacy project reads work unchanged
All existing API calls return correct data
Performance not degraded for legacy projects
Legacy plan-metadata.json still updated

Test Matrix:

Project Type	Read Pages	Write Pages	List Files	Get Metadata
Legacy	✅ Pass	✅ Pass	✅ Pass	✅ Pass
Transitional	✅ Pass	✅ Pass	✅ Pass	✅ Pass
Modern	✅ Pass	✅ Pass	✅ Pass	✅ Pass

Refactoring: Path Resolution Consolidation

Problem Statement

Currently, ArchitecturalPlanReviewer contains 15+ path-related methods that:

Assume legacy flat structure (pages/{pageNum}/)
Mix concerns (domain logic + path utilities)
Have duplicate static/instance method pairs
No support for new hierarchical structure (files/{fileId}/pages/)
Unclear class responsibility: Is it project-level or file-level?

Current Path Methods in ArchitecturalPlanReviewer

// Project-level paths
public static String getDefaultProjectHomeDir(String planId)
public String getProjectHomeDir(String planId)  
public String getProjectHomeDir()
public static String getDefaultProjectsRootFolder()
public String getProjectsRootFolder()

// Legacy page paths (flat structure)
public static String getProjectPagesBasePath(String planId)
public String getPageFolderPath(int pageNumber)
public static String getPageFolderPath(String planId, int pageNumber)

// Metadata file paths
public String getPlanPageMetadataFilePath(int pageNumber)
public static String getPlanPageMetadataFilePath(String planId, int pageNumber)
public String getArchitecturalPlanMetadataFilePath()
public static String getArchitecturalPlanMetadataFilePath(String planId)

// Other paths
private String getProjectOverviewPath()
private String getFullProjectContentPath()
public String getProjectSourcePdfPath()

Issues:

❌ No file_id support
❌ Hardcoded legacy structure
❌ Scattered across business logic class
❌ Static methods don't have access to ProjectPathResolver instance

Semantic Clarity: What is ArchitecturalPlanReviewer?

Current Reality:

Name suggests "single plan" (one file)
Implementation is project-scoped (has planId, loads all pages in project)
Historically: 1 plan = 1 project (no ambiguity)
Future: 1 project = N files (plans, electricals, mechanicals, inspector feedback)

Design Decision: Keep ArchitecturalPlanReviewer as project-scoped with optional file filtering.

Rationale:

Minimal Breaking Changes: Existing code expects project-level operations
Backward Compatible: Can operate on entire project (legacy) or single file (modern)
Incremental Evolution: Can split into file/project classes later if needed
Naming: "Plan" historically meant "project" in our domain

Proposed Architecture

┌─────────────────────────────────────────────────────────────┐
│                   Service Layer                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  ArchitecturalPlanServiceImpl                        │   │
│  │  (gRPC service, orchestrates reviewers)              │   │
│  └───────────────────┬──────────────────────────────────┘   │
│                      │                                       │
└──────────────────────┼───────────────────────────────────────┘
                       │
┌──────────────────────▼───────────────────────────────────────┐
│              Business Logic Layer                            │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  ArchitecturalPlanReviewer                          │    │
│  │  - Operates at PROJECT level by default            │    │
│  │  - Optional fileId filter for single-file mode     │    │
│  │  - Delegates ALL path logic to ProjectPathResolver │    │
│  └────────────┬────────────────────────────────────────┘    │
│               │                                              │
└───────────────┼──────────────────────────────────────────────┘
                │
┌───────────────▼──────────────────────────────────────────────┐
│           Utility/Helper Layer                               │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  ProjectPathResolver                                │    │
│  │  - ALL path construction logic                      │    │
│  │  - Supports modern & legacy structures             │    │
│  │  - Dual-read with file_id optimization             │    │
│  │  - Caching for performance                         │    │
│  └────────────┬────────────────────────────────────────┘    │
│               │                                              │
└───────────────┼──────────────────────────────────────────────┘
                │
┌───────────────▼──────────────────────────────────────────────┐
│         Infrastructure Layer                                 │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  FileSystemHandler (GCS/Local)                      │    │
│  └─────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────┘

Refactoring Plan: 3 Phases

Phase 1: Extract Path Logic to ProjectPathResolver ✅

Goal: Centralize ALL path construction logic in ProjectPathResolver

New Methods in ProjectPathResolver:

public class ProjectPathResolver {
  
  private final FileSystemHandler fileSystemHandler;
  private final String projectsRootFolder; // "projects" by default
  
  // Constructor
  public ProjectPathResolver(FileSystemHandler fileSystemHandler) {
    this(fileSystemHandler, "projects");
  }
  
  public ProjectPathResolver(FileSystemHandler fileSystemHandler, String projectsRootFolder) {
    this.fileSystemHandler = fileSystemHandler;
    this.projectsRootFolder = projectsRootFolder;
  }
  
  // ========================================
  // Project-Level Paths
  // ========================================
  
  /**
   * Returns the projects root folder path.
   * @return "projects" by default
   */
  public String getProjectsRootFolder() {
    return projectsRootFolder;
  }
  
  /**
   * Returns the project home directory path.
   * @param projectId The project ID
   * @return "projects/{projectId}"
   */
  public String getProjectHomeDir(String projectId) {
    return projectsRootFolder + "/" + projectId;
  }
  
  /**
   * Returns the project inputs folder path.
   * @param projectId The project ID
   * @return "projects/{projectId}/inputs"
   */
  public String getProjectInputsPath(String projectId) {
    return getProjectHomeDir(projectId) + "/inputs";
  }
  
  /**
   * Returns the plan metadata file path (plan-metadata.json).
   * @param projectId The project ID
   * @return "projects/{projectId}/plan-metadata.json"
   */
  public String getPlanMetadataFilePath(String projectId) {
    return getProjectHomeDir(projectId) + "/plan-metadata.json";
  }
  
  /**
   * Returns the project metadata file path (project-metadata.json).
   * @param projectId The project ID
   * @return "projects/{projectId}/project-metadata.json"
   */
  public String getProjectMetadataFilePath(String projectId) {
    return getProjectHomeDir(projectId) + "/project-metadata.json";
  }
  
  /**
   * Returns the project overview file path.
   * @param projectId The project ID
   * @return "projects/{projectId}/overview.md"
   */
  public String getProjectOverviewPath(String projectId) {
    return getProjectHomeDir(projectId) + "/overview.md";
  }
  
  /**
   * Returns the full project content file path.
   * @param projectId The project ID
   * @return "projects/{projectId}/project-content.md"
   */
  public String getFullProjectContentPath(String projectId) {
    return getProjectHomeDir(projectId) + "/project-content.md";
  }
  
  // ========================================
  // File-Level Paths (Modern Structure)
  // ========================================
  
  /**
   * Returns the files folder path.
   * @param projectId The project ID
   * @return "projects/{projectId}/files"
   */
  public String getFilesBasePath(String projectId) {
    return getProjectHomeDir(projectId) + "/files";
  }
  
  /**
   * Returns the file index path.
   * @param projectId The project ID
   * @return "projects/{projectId}/files/index.json"
   */
  public String getFileIndexPath(String projectId) {
    return getFilesBasePath(projectId) + "/index.json";
  }
  
  /**
   * Returns the file folder path.
   * @param projectId The project ID
   * @param fileId The file ID (e.g., "1", "2", "3")
   * @return "projects/{projectId}/files/{fileId}"
   */
  public String getFileFolderPath(String projectId, String fileId) {
    return getFilesBasePath(projectId) + "/" + fileId;
  }
  
  /**
   * Returns the file metadata path.
   * @param projectId The project ID
   * @param fileId The file ID
   * @return "projects/{projectId}/files/{fileId}/metadata.json"
   */
  public String getFileMetadataPath(String projectId, String fileId) {
    return getFileFolderPath(projectId, fileId) + "/metadata.json";
  }
  
  /**
   * Returns the file pages folder path.
   * @param projectId The project ID
   * @param fileId The file ID
   * @return "projects/{projectId}/files/{fileId}/pages"
   */
  public String getFilePagesBasePath(String projectId, String fileId) {
    return getFileFolderPath(projectId, fileId) + "/pages";
  }
  
  // ========================================
  // Page-Level Paths (Dual-Read Support)
  // ========================================
  
  /**
   * Returns the legacy pages folder path.
   * @param projectId The project ID
   * @return "projects/{projectId}/pages"
   */
  public String getLegacyPagesBasePath(String projectId) {
    return getProjectHomeDir(projectId) + "/pages";
  }
  
  /**
   * Returns the page folder path with optional file_id.
   * 
   * <p><b>Path Resolution Strategy:</b>
   * <ul>
   *   <li>If fileId provided: Direct modern path (FAST)</li>
   *   <li>If fileId null: Dual-read logic (cache → modern → legacy)</li>
   * </ul>
   * 
   * @param projectId The project ID
   * @param pageNumber The page number (1-based)
   * @param fileId Optional file ID for direct access (null for auto-detect)
   * @return Resolved page folder path
   * @throws PageNotFoundException if page doesn't exist in either structure
   */
  public String resolvePageFolderPath(String projectId, int pageNumber, String fileId) 
      throws PageNotFoundException {
    // Implementation already covered earlier in this TDD
    // ... (see lines 699-745)
  }
  
  /**
   * Returns the page metadata file path.
   * @param projectId The project ID
   * @param pageNumber The page number
   * @param fileId Optional file ID
   * @return "projects/{projectId}/files/{fileId}/pages/{pageNum}/metadata.json" 
   *         or "projects/{projectId}/pages/{pageNum}/metadata.json" (legacy)
   */
  public String getPageMetadataPath(String projectId, int pageNumber, String fileId) 
      throws PageNotFoundException {
    String pageFolderPath = resolvePageFolderPath(projectId, pageNumber, fileId);
    return pageFolderPath + "/metadata.json";
  }
  
  /**
   * Returns the page PDF file path.
   * @param projectId The project ID
   * @param pageNumber The page number
   * @param fileId Optional file ID
   * @return Path to page.pdf
   */
  public String getPagePdfPath(String projectId, int pageNumber, String fileId) 
      throws PageNotFoundException {
    String pageFolderPath = resolvePageFolderPath(projectId, pageNumber, fileId);
    return pageFolderPath + "/page.pdf";
  }
  
  /**
   * Returns the page markdown file path.
   * @param projectId The project ID
   * @param pageNumber The page number
   * @param fileId Optional file ID
   * @return Path to page.md
   */
  public String getPageMarkdownPath(String projectId, int pageNumber, String fileId) 
      throws PageNotFoundException {
    String pageFolderPath = resolvePageFolderPath(projectId, pageNumber, fileId);
    return pageFolderPath + "/page.md";
  }
  
  // ========================================
  // Utility Methods
  // ========================================
  
  /**
   * Checks if project uses modern file structure.
   * @param projectId The project ID
   * @return true if files/ directory exists
   */
  public boolean isModernStructure(String projectId) throws IOException {
    return fileSystemHandler.exists(getFilesBasePath(projectId));
  }
  
  /**
   * Checks if project uses legacy structure.
   * @param projectId The project ID
   * @return true if pages/ directory exists but files/ doesn't
   */
  public boolean isLegacyStructure(String projectId) throws IOException {
    String filesPath = getFilesBasePath(projectId);
    String pagesPath = getLegacyPagesBasePath(projectId);
    return fileSystemHandler.exists(pagesPath) && !fileSystemHandler.exists(filesPath);
  }
  
  /**
   * Atomically increments and returns the next file ID for a project.
   * Thread-safe for concurrent file uploads using optimistic locking (CAS).
   * 
   * @param projectId The project ID
   * @return The assigned file ID (guaranteed unique within project)
   * @throws IOException if max retries exceeded or I/O error
   */
  public int getAndIncrementFileId(String projectId) throws IOException {
    String indexPath = getFileIndexPath(projectId);
    int maxRetries = 10;
    
    for (int attempt = 0; attempt < maxRetries; attempt++) {
      try {
        // Read current index with version (GCS generation number)
        Long currentVersion = fileSystemHandler.getFileVersion(indexPath);
        
        JSONObject index;
        if (currentVersion == null) {
          // File doesn't exist - initialize new index
          index = new JSONObject();
          index.put("next_file_id", 1);
          index.put("files", new JSONArray());
        } else {
          // File exists - read and parse
          String content = fileSystemHandler.readFile(indexPath);
          index = new JSONObject(content);
        }
        
        // Get current ID and increment for next time
        int assignedId = index.optInt("next_file_id", 1);
        index.put("next_file_id", assignedId + 1);
        
        // Atomic write: Only succeeds if version matches
        try {
          fileSystemHandler.writeFileAtomic(indexPath, index.toString(2), currentVersion);
          return assignedId;
        } catch (AtomicWriteConflictException e) {
          // Another thread modified - retry with exponential backoff
          Thread.sleep(50 + (attempt * 10));
          continue;
        }
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new IOException("Interrupted while assigning file ID", e);
      }
    }
    
    throw new IOException("Failed to assign file ID after " + maxRetries + " retries");
  }
}

FileSystemHandler Atomic Operations

New Abstract Methods for thread-safe file ID generation:

/**
 * Atomically writes a file if the expected version/generation matches.
 * Provides compare-and-set (CAS) semantics for concurrent-safe updates.
 * 
 * @param path The file path
 * @param content The content to write
 * @param expectedVersion The expected version/generation (null for "must not exist")
 * @return The new version/generation after write
 * @throws AtomicWriteConflictException if version doesn't match
 * @throws IOException if there's an I/O error
 */
public abstract long writeFileAtomic(String path, String content, Long expectedVersion) 
    throws IOException, AtomicWriteConflictException;

/**
 * Gets the current version/generation of a file.
 * 
 * @param path The file path
 * @return The current version/generation, or null if file doesn't exist
 * @throws IOException if there's an error accessing the file
 */
public abstract Long getFileVersion(String path) throws IOException;

GcsFileSystemHandler Implementation:

Uses native GCS generation numbers
BlobTargetOption.generationMatch(expectedVersion) for CAS
BlobTargetOption.doesNotExist() for new files
Returns HTTP 412 on conflicts → AtomicWriteConflictException

LocalFileSystemHandler Implementation:

Uses last modified time as version (milliseconds)
Synchronized locks per file path (single-instance only)
Note: Doesn't scale horizontally (use GCS in production)

AtomicWriteConflictException:

Custom exception for CAS failures
Contains: path, expectedVersion, actualVersion
Signals retry needed in getAndIncrementFileId()

Phase 2: Update ArchitecturalPlanReviewer

Goal: Delegate all path logic to ProjectPathResolver, add optional fileId support

Changes to ArchitecturalPlanReviewer:

public class ArchitecturalPlanReviewer {
  
  private final String planId;  // Actually projectId
  private final String projectsRootFolder;
  private final FileSystemHandler fileSystemHandler;
  
  // NEW: ProjectPathResolver instance
  private final ProjectPathResolver pathResolver;
  
  // NEW: Optional file ID for single-file mode
  private final String fileId;  // null for project-wide mode
  
  // Constructor with optional fileId
  public ArchitecturalPlanReviewer(
      String planId, 
      FileSystemHandler fileSystemHandler, 
      String projectSourcePdfPath,
      ModelClient modelClient, 
      List<Integer> pageList, 
      boolean forceReprocess, 
      boolean enableOrientationDetection, 
      List<String> iccDocumentIds,
      ProgressCallback progressCallback, 
      String projectsRootFolder,
      String fileId) throws IOException {  // NEW parameter
    
    this.planId = planId;
    this.projectsRootFolder = projectsRootFolder;
    this.fileSystemHandler = fileSystemHandler;
    this.fileId = fileId;  // NEW field
    
    // NEW: Initialize ProjectPathResolver
    this.pathResolver = new ProjectPathResolver(fileSystemHandler, projectsRootFolder);
    
    // ... rest of initialization
  }
  
  // ========================================
  // Updated Path Methods (delegate to ProjectPathResolver)
  // ========================================
  
  public String getProjectHomeDir() {
    return pathResolver.getProjectHomeDir(planId);
  }
  
  public static String getDefaultProjectHomeDir(String planId) {
    // Backward compatibility: use default root folder
    ProjectPathResolver resolver = new ProjectPathResolver(
        FileSystemHandlerFactory.createDefaultFileSystemHandler());
    return resolver.getProjectHomeDir(planId);
  }
  
  public String getPageFolderPath(int pageNumber) throws PageNotFoundException {
    return pathResolver.resolvePageFolderPath(planId, pageNumber, fileId);
  }
  
  public static String getPageFolderPath(String planId, int pageNumber) {
    ProjectPathResolver resolver = new ProjectPathResolver(
        FileSystemHandlerFactory.createDefaultFileSystemHandler());
    try {
      return resolver.resolvePageFolderPath(planId, pageNumber, null);
    } catch (PageNotFoundException e) {
      throw new RuntimeException(e);
    }
  }
  
  public String getPlanPageMetadataFilePath(int pageNumber) throws PageNotFoundException {
    return pathResolver.getPageMetadataPath(planId, pageNumber, fileId);
  }
  
  private String getArchitecturalPlanMetadataFilePath() {
    return pathResolver.getPlanMetadataFilePath(planId);
  }
  
  private String getProjectOverviewPath() {
    return pathResolver.getProjectOverviewPath(planId);
  }
  
  private String getFullProjectContentPath() {
    return pathResolver.getFullProjectContentPath(planId);
  }
  
  // ... all other path methods delegate to pathResolver
  
  // ========================================
  // Optional: Convenience Getters
  // ========================================
  
  public String getFileId() {
    return fileId;
  }
  
  public boolean isSingleFileMode() {
    return fileId != null && !fileId.isEmpty();
  }
  
  public ProjectPathResolver getPathResolver() {
    return pathResolver;
  }
}

Backward Compatibility:

All existing constructors remain unchanged
New fileId parameter added to most flexible constructor only
Default behavior (fileId=null) maintains current project-wide operation
Static methods continue to work via temporary ProjectPathResolver instances

Phase 3: Service Layer Updates

Goal: Update gRPC service implementations to extract and pass file_id to ArchitecturalPlanReviewer

Key Services to Update:

ArchitecturalPlanServiceImpl (main facade)
ArchitecturalPlanReviewServiceImpl (compliance analysis)
ArchitecturalPlanAnalysisServiceImpl (analysis availability)
ComplianceReportAsyncServiceImpl (async tasks)

Example: ArchitecturalPlanServiceImpl:

public class ArchitecturalPlanServiceImpl {
  
  @Override
  public PageApplicabilityAnalysisList getApplicableCodeSections(
      GetApplicableCodeSectionsRequest request) {
    
    String projectId = request.getArchitecturalPlanId();
    int pageNumber = request.getPageNumber();
    String fileId = request.getFileId();  // From updated proto
    
    // Create reviewer with optional fileId
    // If fileId provided, reviewer operates in single-file mode
    // If fileId null, reviewer operates in project-wide mode (legacy)
    ArchitecturalPlanReviewer reviewer = createReviewer(projectId, fileId);
    
    // Reviewer automatically uses fileId for path resolution
    // ...
  }
  
  private ArchitecturalPlanReviewer createReviewer(String projectId, String fileId) {
    // Use constructor with fileId parameter
    // ...
  }
}

Phase 4: Frontend/UI Updates

Goal: Update Angular frontend to track file_id and pass it in RPC requests

Key Components to Update:

API Service (api.service.ts) - Add file_id parameter to RPC methods
Compliance Component - Look up file_id from InputFileMetadata
Page Navigation Component - Track file-to-page mappings
File Metadata Service - Fetch and cache InputFileMetadata list

Implementation Pattern:

// 1. Fetch InputFileMetadata for the project
this.fileMetadataService.listInputFiles(projectId).subscribe(files => {
  this.inputFiles = files;
});

// 2. Look up file_id for a given page number
private getFileIdForPage(pageNumber: number): string | undefined {
  const fileMetadata = this.inputFiles.find(
    f => f.extracted_pages.includes(String(pageNumber))
  );
  return fileMetadata?.file_id;
}

// 3. Pass file_id when making RPC calls
loadPageAnalysis(pageNumber: number) {
  const fileId = this.getFileIdForPage(pageNumber);
  
  this.apiService.getApplicableCodeSections(
    this.projectId,
    pageNumber,
    this.iccBookId,
    fileId  // Pass file_id (undefined for legacy projects)
  ).subscribe(/* ... */);
}

Migration Path (Refactoring Phases)

Phase 1: Backend Infrastructure (Week 1-2) ✅ COMPLETE

✅ Implement ProjectPathResolver with all path methods
✅ Add optional file_id to RPC proto definitions
✅ Write comprehensive unit tests (40 tests)
✅ Implement atomic getAndIncrementFileId() with GCS CAS
✅ Add FileSystemHandler atomic operations

Phase 2: ArchitecturalPlanReviewer Refactoring (Week 3) ✅ COMPLETE

✅ Update ArchitecturalPlanReviewer to use ProjectPathResolver
✅ Add optional fileId parameter to constructor
✅ Delegate all 8 path methods to pathResolver
✅ Add static path builders (no FileSystemHandler overhead)
✅ Maintain backward compatibility (all existing tests pass)

Phase 3: Service Layer Updates (Week 3-4) 🔜 NEXT

Update ArchitecturalPlanServiceImpl to pass fileId
Update ArchitecturalPlanReviewServiceImpl to pass fileId
Update ArchitecturalPlanAnalysisServiceImpl to pass fileId
Update ComplianceReportAsyncServiceImpl to pass fileId
Extract fileId from RPC requests and pass to reviewer constructor
Integration testing

Phase 4: Frontend/UI Updates (Week 4-5)

Update api.service.ts - Add file_id parameter to RPC methods
Update compliance component - Look up file_id from InputFileMetadata
Create file metadata service - Fetch and cache InputFileMetadata list
Update page navigation - Display hierarchical file tree
UI testing and polish

Benefits

Single Source of Truth: All path logic in one place
DRY Principle: No duplication between static/instance methods
Testability: Easy to mock ProjectPathResolver
Flexibility: Supports modern, legacy, and hybrid structures
Performance: Caching and optimization in one place
Backward Compatible: Existing code continues to work
Future-Proof: Easy to add new path types (e.g., reports/{fileId}/)

Testing Strategy

Unit Tests for ProjectPathResolver:

@Test
public void testResolvePagePath_WithFileId_DirectPath() {
  ProjectPathResolver resolver = new ProjectPathResolver(mockFileSystemHandler);
  String path = resolver.resolvePageFolderPath("project-1", 5, "2");
  assertEquals("projects/project-1/files/2/pages/005", path);
  // Should NOT call fileSystemHandler (no filesystem checks)
}

@Test
public void testResolvePagePath_WithoutFileId_ModernStructure() throws Exception {
  when(mockFileSystemHandler.exists("projects/project-1/files/")).thenReturn(true);
  when(mockFileSystemHandler.listDirectories("projects/project-1/files/"))
      .thenReturn(Arrays.asList("1", "2", "3"));
  when(mockFileSystemHandler.exists("projects/project-1/files/2/pages/005/"))
      .thenReturn(true);
  
  String path = resolver.resolvePageFolderPath("project-1", 5, null);
  assertEquals("projects/project-1/files/2/pages/005", path);
}

@Test
public void testResolvePagePath_WithoutFileId_LegacyFallback() throws Exception {
  when(mockFileSystemHandler.exists("projects/project-1/files/")).thenReturn(false);
  when(mockFileSystemHandler.exists("projects/project-1/pages/005/")).thenReturn(true);
  
  String path = resolver.resolvePageFolderPath("project-1", 5, null);
  assertEquals("projects/project-1/pages/005", path);
}

Open Questions for Discussion

Naming: Should we rename planId → projectId throughout the codebase for clarity?
- Recommendation: Yes, but as separate refactoring (Issue #XXX)
Static Methods: Keep static methods in ArchitecturalPlanReviewer for backward compatibility?
- Recommendation: Yes, but mark as @Deprecated after Phase 2
File Index Performance: Should ProjectPathResolver cache files/index.json in memory?
- Recommendation: Yes, with TTL of 5 minutes (balances freshness vs performance)
Future Split: Should we eventually split ArchitecturalPlanReviewer into file/project classes?
- Recommendation: Monitor usage patterns; split only if clear need emerges

Deployment Strategy

Phase 1: Backend Infrastructure (Week 1)

Deliverables:

ProjectPathResolver with fallback logic
InputFileMetadataService basic implementation
Unit tests passing
Feature flag: enable_dual_read_filesystem (default: true)

Deployment: Deploy to dev, run integration tests, promote to staging

Risk: Low (read-only, backward compatible)

Phase 2: Migration Service (Week 2)

Deliverables:

FileStructureMigrationService with dry-run support
CLI tool for bulk upgrades
Integration tests with real legacy projects
Feature flag: enable_file_structure_migration (default: false)

Deployment: Deploy to dev, test migration on cloned projects

Risk: Medium (write operations, but preserves legacy structure)

Phase 3: Frontend Integration (Week 3)

Deliverables:

FileMetadataListComponent showing file list (project settings)
PageTocHierarchicalComponent for hierarchical navigation (TOC sidebar)
LegacyUpgradeBannerComponent prompting users
User-initiated migration workflow
E2E tests in Cypress

Deployment: Deploy to dev, user acceptance testing

Risk: Low (UI only, backend already deployed)

Phase 4: Production Rollout (Week 4)

Deliverables:

Enable feature flags in production
Monitor error rates and performance
Gradual rollout: 10% → 50% → 100% of users
Rollback plan prepared

Deployment: Canary deployment, monitor metrics

Risk: Low (extensive testing, rollback available)

Monitoring and Observability

Key Metrics

Read Performance:
- page_read_latency_ms (p50, p95, p99)
- path_cache_hit_rate (target: > 80%)
- legacy_fallback_rate (should decrease over time)
Migration Success:
- migrations_total (count)
- migrations_successful (count)
- migrations_failed (count)
- migration_duration_seconds (histogram)
File Metadata:
- files_with_metadata_percent (target: 100%)
- classification_accuracy (manual validation)

Alerts

Critical:
- legacy_fallback_rate > 50% (indicates new structure not working)
- page_read_latency_p99 > 2000ms (performance regression)
- migrations_failed / migrations_total > 0.05 (5% failure rate)
Warning:
- path_cache_hit_rate < 60% (cache ineffective)
- files_without_metadata > 10 (metadata generation failing)

Rollback Plan

Immediate Rollback (< 1 hour)

Scenario: Critical bug detected in production

Steps:

Disable feature flag: enable_dual_read_filesystem = false
Revert to previous deployment
All reads go directly to legacy pages/ structure
No data loss (legacy structure preserved)

Partial Rollback (Specific Projects)

Scenario: Migration failed for specific projects

Steps:

Identify affected projects
Delete files/ folder for those projects
Pages automatically fall back to legacy pages/ structure
No functionality lost

Data Recovery

Scenario: Accidental data loss (unlikely due to preservation)

Steps:

Legacy pages/ folder is never deleted (configured via preserve_legacy_structure = true)
Restore from Cloud Storage versioning if needed
Re-run migration with fixed logic

Performance Considerations

Path Caching

In-memory cache with 1-hour TTL
Reduces filesystem checks by 80%+
Cache invalidation on migration

Lazy Metadata Loading

Metadata loaded on-demand, not preemptively
List operations return minimal metadata
Full metadata fetched when needed

Parallel Migration

Multiple projects can be migrated concurrently
Pages within a project migrated sequentially (safer)
Configurable concurrency limit

Security Considerations

RBAC Integration

Migration requires OWNER permissions
File metadata respects project-level permissions
Admin bulk upgrades logged for audit trail

Data Integrity

Checksums verified during migration
Transactional migrations (all-or-nothing where possible)
Legacy structure preserved for rollback

✅ Implementation Status (October 2025)

COMPLETED FEATURES

All core functionality has been successfully implemented and is working in production:

Backend Infrastructure ✅

InputFileMetadataService: Complete metadata generation and management
ProjectPathResolver: Intelligent dual-read with caching (modern → legacy fallback)
Atomic File Operations: GCS generation-based Compare-and-Set for race condition prevention
File-Aware gRPC API: Enhanced GetArchitecturalPlanPageRequest with file_id parameter
Thread-Safe Metadata Updates: Retry logic with exponential backoff for concurrent operations
Comprehensive Logging: Detailed debugging information throughout the system

Frontend Integration ✅

Hierarchical Table of Contents: Expandable file containers with nested pages
File-Aware Navigation: URLs include file ID (/files/{file_id}/pages/{page_number}/{tab})
Enhanced File Headers: Two-line layout with document type, visual emphasis, and proper spacing
File-Aware Page Selection: Correct highlighting and content loading per file
Page Overlap Detection: Scoped to individual files (not project-wide)
Automatic UI Refresh: Updates after background ingestion task completion
Intelligent Caching: Prevents unnecessary data reloads and race conditions

Multi-File Support ✅

File-Aware PDF Loading: Backend correctly serves PDFs from specific files
Concurrent Ingestion: Multiple files can be processed simultaneously without conflicts
File-Specific Operations: Page ingestion, overlap detection, and metadata updates per file
Backward Compatibility: Legacy single-file projects continue to work seamlessly

KEY ARCHITECTURAL DECISIONS MADE

File ID Strategy: Auto-incrementing integers (1, 2, 3...) for readable URLs
Path Resolution: Modern structure first, legacy fallback with caching
Metadata Updates: Atomic operations using GCS object generations
UI Pattern: Angular Material expansion panels for hierarchical navigation
URL Structure: File-aware routes with backward compatibility redirects
Caching Strategy: Path-based caching with file-aware cache keys

PRODUCTION DEPLOYMENT STATUS

✅ Backend Services: Deployed and operational
✅ Frontend UI: Hierarchical navigation working
✅ gRPC API: File-aware endpoints functional
✅ Database Schema: Metadata structure implemented
✅ Migration Support: Dual-read compatibility active

Future Enhancements

AI Document Classification: Use LLM to classify document types with higher accuracy
Content-Based Summarization: Generate AI summaries of file contents
Automatic Metadata Refresh: Periodically update metadata for stale files
Advanced Search: Full-text search across file metadata and content
File Versioning: Track changes to input files over time
Multi-File Coordination: Batch upload with relationship tracking
Custom Metadata Fields: User-defined tags and labels
Analytics Dashboard: Visualize file types, processing times, storage usage

File Index Structure

Purpose

The files/index.json file serves a single purpose:

File ID Generation: Maintains auto-increment counter for new files

Schema

Location: projects/{projectId}/files/index.json

{
  "next_file_id": 4,
  "files": [
    {
      "file_id": "1",
      "file_name": "architectural-plans.pdf"
    },
    {
      "file_id": "2", 
      "file_name": "electrical-plans.pdf"
    },
    {
      "file_id": "3",
      "file_name": "structural-plans.pdf"
    }
  ]
}

Why No page_to_file_map?

Initially considered mapping page numbers to file IDs, but this is fundamentally flawed for multi-file projects:

❌ Ambiguous: Page "1" exists in multiple files (architectural, electrical, structural)
❌ Not scalable: Can't map file-scoped page numbers to files
✅ Solution: Frontend/API must always pass both file_id AND page_number

Page Number Semantics:

Modern projects: Page numbers are file-scoped (each file has pages 1, 2, 3...)
Legacy projects: Page numbers are project-global (single file, sequential)
Migration: Legacy global pages → Modern file-scoped pages (e.g., page 46 → file 2, page 1)

Benefits:

✅ Simple, unambiguous schema
✅ Single source of truth for next file ID
✅ Small file size (few KB even with hundreds of files)
✅ No page number collisions

Update Strategy:

Updated when new files are uploaded
Updated when files are deleted
Read-only for page lookups (frontend tracks file_id separately)

Open Questions

Q: How do we track which pages belong to which file?
A: Use InputFileMetadata.extracted_pages field (stored in files/{file_id}/metadata.json)
Q: What if users upload duplicate files?
A: Detect duplicates using MD5 checksum, prompt user to replace or keep both
Q: How to handle pages that don't belong to any input file?
A: Create a "miscellaneous" file entry with ID unknown-source
Q: Should migration be reversible (downgrade from modern to legacy)?
A: Not initially - legacy structure is preserved, so just delete files/ to "downgrade"

Success Criteria

Phase 1 (Infrastructure):

✅ All legacy projects continue to work unchanged
✅ No performance regression (< 5% latency increase)
✅ 100% backward compatibility test coverage

Phase 2 (Migration):

✅ > 95% migration success rate
✅ Zero data loss incidents
✅ Legacy structure preserved in all cases

Phase 3 (Frontend):

✅ File metadata visible in UI (project settings page)
✅ Hierarchical page navigation implemented (TOC sidebar)
✅ User-initiated upgrades working
✅ Positive user feedback on new features

Phase 4 (Adoption):

✅ > 50% of active projects upgraded within 3 months
✅ File metadata used in search/filter features
✅ Reduced support tickets about file organization

References

PRD: File Structure Reorganization
Issue #167
Issue #227: Project Metadata Management
Protocol Buffer Definitions: src/main/proto/api.proto
ArchitecturalPlanReviewer: src/main/java/org/codetricks/construction/code/assistant/ArchitecturalPlanReviewer.java
Developer Playbook

Overview​

Architecture Overview​

System Components​

Data Flow: Read Operations (Simplified Strategy)​

Data Flow: Write Operations (Selective Write)​

Proto Definitions​

Existing Proto Messages (For Reference)​

New Proto Messages for Migration (Add to api.proto)​

Add New RPCs to ArchitecturalPlanService​

Update Existing RPC Request Messages (Backward Compatible)​

Strategy: Optional file_id Field​

Request Messages Requiring Updates​

Backend Service Implementation Pattern​

CLI Updates Required​

Frontend/UI Updates Required​

Testing Backward Compatibility​

Migration Impact​

Performance Considerations​

Backend Implementation​

1. ProjectPathResolver​

2. InputFileMetadataService​

3. Migration Readiness Assessment​

Readiness Status Values​

Assessment Logic​

Common Issues and Resolutions​

4. FileStructureMigrationService​

5. DocumentClassificationService​

Frontend Implementation​

1. File Metadata List Component​

2. Hierarchical Page Navigation Component​

3. Legacy Project Upgrade Banner Component​

CLI Tools​

Bulk Upgrade Command​

Testing Strategy​

1. Unit Tests​

2. Integration Tests​

3. Backward Compatibility Tests​

Refactoring: Path Resolution Consolidation​

Problem Statement​

Current Path Methods in ArchitecturalPlanReviewer​

Semantic Clarity: What is ArchitecturalPlanReviewer?​

Proposed Architecture​

Refactoring Plan: 3 Phases​

Phase 1: Extract Path Logic to ProjectPathResolver ✅​

FileSystemHandler Atomic Operations​

Phase 2: Update ArchitecturalPlanReviewer​

Phase 3: Service Layer Updates​

Phase 4: Frontend/UI Updates​

Migration Path (Refactoring Phases)​

Benefits​

Testing Strategy​

Open Questions for Discussion​

Deployment Strategy​

Phase 1: Backend Infrastructure (Week 1)​

Phase 2: Migration Service (Week 2)​

Phase 3: Frontend Integration (Week 3)​

Phase 4: Production Rollout (Week 4)​

Monitoring and Observability​

Key Metrics​

Alerts​

Rollback Plan​

Immediate Rollback (< 1 hour)​

Partial Rollback (Specific Projects)​

Data Recovery​

Performance Considerations​

Path Caching​

Lazy Metadata Loading​

Parallel Migration​

Security Considerations​

RBAC Integration​

Data Integrity​

✅ Implementation Status (October 2025)​

COMPLETED FEATURES​

Backend Infrastructure ✅​

Frontend Integration ✅​

Multi-File Support ✅​

KEY ARCHITECTURAL DECISIONS MADE​

PRODUCTION DEPLOYMENT STATUS​

Future Enhancements​

File Index Structure​

Overview

Architecture Overview

System Components

Data Flow: Read Operations (Simplified Strategy)

Data Flow: Write Operations (Selective Write)

Proto Definitions

Existing Proto Messages (For Reference)

New Proto Messages for Migration (Add to api.proto)

Add New RPCs to ArchitecturalPlanService

Update Existing RPC Request Messages (Backward Compatible)

Strategy: Optional `file_id` Field

Request Messages Requiring Updates

Backend Service Implementation Pattern

CLI Updates Required

Frontend/UI Updates Required

Testing Backward Compatibility

Migration Impact

Performance Considerations

Backend Implementation

1. ProjectPathResolver

2. InputFileMetadataService

3. Migration Readiness Assessment

Readiness Status Values

Assessment Logic

Common Issues and Resolutions

4. FileStructureMigrationService

5. DocumentClassificationService

Frontend Implementation

1. File Metadata List Component

2. Hierarchical Page Navigation Component

3. Legacy Project Upgrade Banner Component

CLI Tools

Bulk Upgrade Command

Testing Strategy

1. Unit Tests

2. Integration Tests

3. Backward Compatibility Tests

Refactoring: Path Resolution Consolidation

Problem Statement

Current Path Methods in ArchitecturalPlanReviewer

Semantic Clarity: What is ArchitecturalPlanReviewer?

Proposed Architecture

Refactoring Plan: 3 Phases

Phase 1: Extract Path Logic to ProjectPathResolver ✅

FileSystemHandler Atomic Operations

Phase 2: Update ArchitecturalPlanReviewer

Phase 3: Service Layer Updates

Phase 4: Frontend/UI Updates

Migration Path (Refactoring Phases)

Benefits

Testing Strategy

Open Questions for Discussion

Deployment Strategy

Phase 1: Backend Infrastructure (Week 1)

Phase 2: Migration Service (Week 2)

Phase 3: Frontend Integration (Week 3)

Phase 4: Production Rollout (Week 4)

Monitoring and Observability

Key Metrics

Alerts

Rollback Plan

Immediate Rollback (< 1 hour)

Partial Rollback (Specific Projects)

Data Recovery

Performance Considerations

Path Caching

Lazy Metadata Loading

Parallel Migration

Security Considerations

RBAC Integration

Data Integrity

✅ Implementation Status (October 2025)

COMPLETED FEATURES

Backend Infrastructure ✅

Frontend Integration ✅

Multi-File Support ✅

KEY ARCHITECTURAL DECISIONS MADE

PRODUCTION DEPLOYMENT STATUS

Future Enhancements

File Index Structure