File Structure Reorganization
π Product Requirements: File Structure Reorganization PRD
π Implementation Issue: Issue #167
Overviewβ
This Technical Design Document details the implementation of hierarchical file structure with rich metadata, replacing the flat pages/ directory with files/{file_id}/pages/ while maintaining full backward compatibility with legacy projects.
Architecture Overviewβ
System Componentsβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Angular) β
β ββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β FileMetadataList β β LegacyUpgradeBanner β β
β β Component β β Component β β
β ββββββββββ¬ββββββββββββ ββββββββββββ¬ββββββββββββββββββββ β
β β β β
β ββββββββββββ¬βββββββββββββββ β
β β gRPC-Web β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β βΌ β
β gRPC Gateway (Envoy) β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β βΌ Backend (Java/Spring) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ArchitecturalPlanService (Facade) β β
β βββββββββ¬βββββββββββββββββββββββββββββββββ¬βββββββββββββ β
β β β β
β βββββββββΌβββββββββββββββ βββββββ-ββββΌββββββββββββββ β
β β InputFileMetadata β β FileStructureMigration β β
β β Service β β Service β β
β βββββββββ¬βββββββββββββββ βββββ-ββββββ¬ββββββββββββββ β
β β β β
β βββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββ β
β β ProjectPathResolver β β
β β (Transparent Legacy Fallback + Path Caching) β β
β βββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cloud Storage (GCS) β
β projects/{projectId}/ β
β βββ files/{file_id}/ β NEW β
β β βββ metadata.json β NEW β
β β βββ pages/{pageNumber}/ β NEW β
β βββ pages/{pageNumber}/ β LEGACY β
β βββ inputs/{filename} β UNCHANGED β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flow: Read Operations (Simplified Strategy)β
User Request (projectId, pageNum, optional fileId)
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββ
β ProjectPathResolver β
β .resolvePagePath(projectId, pageNum, fileId?)β
ββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
Is fileId provided?
β
ββββββ΄βββββ
β Yes β No (Legacy Fallback Only)
βΌ βΌ
βββββββββββββββββββββββββββ Is path cached?
β Modern: Direct Path β β
β (String Construction) β ββββββ΄βββββ
β files/{fileId}/pages/ β β Yes β No
β β βΌ βΌ
β Performance: 0ms β Return ββββββββββββββββββββββββββββββββ
β No I/O, No Cache β cached β Legacy Structure: β
βββββββββββ¬ββββββββββββββββ path β buildLegacyPageFolderPath() β
β β + exists() check β
β ββββββββββ¬ββββββββββββββββββββββ
β β
β File exists?
β β
β ββββββ΄βββββ
β β Yes β No
β βΌ βΌ
β Return Throw
β legacy PageNotFound
β path (Modern projects
β (cache) require file_id)
ββββββββββββββββββββββββββββββββ
β
βΌ
Return path
Key Design Decisions:
- β
Modern projects MUST provide
file_id(page numbers are file-scoped) - β
No expensive scanning (removed
listSubdirectories()call) - β Simple caching (1 GCS exists call vs instant HashMap lookup)
- β Clear contract: "Want modern structure? Provide file_id!"
Data Flow: Write Operations (Selective Write)β
New Page Ingestion
β
βΌ
ββββββββββββββββββββββββββββββββββ
β IngestArchitecturalPlan β
β (projectId, fileId, pageData) β
ββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Detect Project Structure Versionβ
ββββββββββ¬βββββββββββββββββββββββββ
β
ββββββ΄ββββββ
β Version? β
ββββββ¬ββββββ
β
ββββββ΄ββββββββ¬ββββββββββββ¬βββββββββββ
β LEGACY β TRANS- β MODERN β
β β ITIONAL β β
βΌ βΌ βΌ
βββββββββββ ββββββββββββββ βββββββββββββββββββββ
β Write toβ β Write to β β Write to β
β pages/ β β files/ β β files/ ONLY β
β (compat)β β (new path) β βββββββββββββββββββββ
βββββββββββ ββββββββββββββ
β β
ββββββββββ¬ββββ
β
βΌ
ββββββββββββββββββββββ
β Update β
β plan-metadata.json β
β (backward compat) β
ββββββββββββββββββββββ
Proto Definitionsβ
Note: Proto definitions already exist in api.proto (lines 225-274). No changes needed!
Enum Naming Consideration: The current enums use prefixed naming (e.g., DOCUMENT_TYPE_ARCHITECTURAL_PLAN, PROCESSING_STATUS_COMPLETED) which doesn't follow our best practice of using dedicated packages with clean enum values (e.g., ARCHITECTURAL_PLAN in package ...file.metadata). However, since these enums are already deployed and used in production:
- β For this issue: Keep existing enum structure (no breaking changes)
- π Future enhancement: Consider moving to
file_metadata.protowith clean enums (separate refactoring issue)
This aligns with our pragmatic approach: work with what exists, improve incrementally.
Existing Proto Messages (For Reference)β
// Already exists in api.proto (line 225)
import "google/protobuf/timestamp.proto";
message InputFileMetadata {
// Basic file information
string file_id = 1; // Auto-increment ID (e.g., "1", "2", "3")
string file_name = 2;
string file_path = 3;
string mime_type = 4;
int64 file_size_bytes = 5;
google.protobuf.Timestamp upload_date = 6; // When file was uploaded
// Document classification
DocumentType document_type = 7;
int32 page_count = 8;
// Processing metadata
ProcessingStatus processing_status = 9;
google.protobuf.Timestamp processed_date = 10; // When processing completed
repeated string extracted_pages = 11;
// Content insights
string content_summary = 12;
// Technical metadata
string checksum_md5 = 13;
}
enum DocumentType {
DOCUMENT_TYPE_UNKNOWN = 0;
DOCUMENT_TYPE_ARCHITECTURAL_PLAN = 1;
DOCUMENT_TYPE_MECHANICAL_PLAN = 2;
DOCUMENT_TYPE_ELECTRICAL_PLAN = 3;
DOCUMENT_TYPE_STRUCTURAL_PLAN = 4;
DOCUMENT_TYPE_INSPECTOR_FEEDBACK = 5;
DOCUMENT_TYPE_PERMIT_APPLICATION = 6;
DOCUMENT_TYPE_CODE_COMPLIANCE_REPORT = 7;
DOCUMENT_TYPE_SITE_PLAN = 8;
DOCUMENT_TYPE_ELEVATION_DRAWING = 9;
DOCUMENT_TYPE_SECTION_DRAWING = 10;
}
enum ProcessingStatus {
PROCESSING_STATUS_UNKNOWN = 0;
PROCESSING_STATUS_UPLOADED = 1;
PROCESSING_STATUS_PROCESSING = 2;
PROCESSING_STATUS_COMPLETED = 3;
PROCESSING_STATUS_FAILED = 4;
}
New Proto Messages for Migration (Add to api.proto)β
// Request to migrate a legacy project to new file structure
message MigrateProjectFileStructureRequest {
// The unique identifier of the project to migrate
string project_id = 1;
// Whether to preserve the legacy pages/ folder after migration
// (default: true for safety)
bool preserve_legacy_structure = 2;
// Whether to run in dry-run mode (preview changes without applying)
bool dry_run = 3;
// User ID initiating the migration (for audit trail)
string initiated_by = 4;
}
// Response from file structure migration
message MigrateProjectFileStructureResponse {
// The unique identifier of the project
string project_id = 1;
// Whether the migration was successful
bool success = 2;
// List of files created with metadata
repeated InputFileMetadata migrated_files = 3;
// Number of pages migrated per file
map<string, int32> pages_per_file = 4;
// Total number of pages migrated
int32 total_pages_migrated = 5;
// Error message if migration failed
string error_message = 6;
// Warnings or informational messages
repeated string warnings = 7;
// Timestamp when migration completed
google.protobuf.Timestamp completed_at = 8;
}
// Request to analyze a project's migration readiness
// Performs a comprehensive check to determine if a project can be safely migrated
// from legacy (flat pages/) structure to modern (hierarchical files/) structure.
message AnalyzeProjectMigrationRequest {
// The unique identifier of the project to analyze
string project_id = 1;
}
// Response containing detailed migration readiness analysis
// Provides all information needed to decide if/when to migrate a project.
message AnalyzeProjectMigrationResponse {
// The unique identifier of the project
string project_id = 1;
// Current project structure version (LEGACY, TRANSITIONAL, or MODERN)
// - LEGACY: Only has pages/ folder β needs migration
// - TRANSITIONAL: Has both pages/ and files/ β migration in progress or partially complete
// - MODERN: Only has files/ folder β already migrated
ProjectStructureVersion current_version = 2;
// Whether project needs migration (true for LEGACY projects only)
// If false, project is already migrated or in transition
bool needs_migration = 3;
// Number of input files found in inputs/ folder
// Used to estimate how many file metadata entries will be created
// Good readiness: > 0 (at least one source file exists)
int32 estimated_file_count = 4;
// Number of existing pages in pages/ folder
// Used to estimate migration workload
// Good readiness: matches actual page count in plan-metadata.json
int32 estimated_page_count = 5;
// Estimated time to complete migration in seconds
// Calculation: (page_count * 1s) + (file_count * 5s)
// Good readiness: < 300s (5 minutes) for typical projects
int32 estimated_duration_seconds = 6;
// Potential issues or blockers that could prevent successful migration
// Examples:
// - "No input files found in inputs/ folder"
// - "Page numbering gaps detected (missing pages 3, 5)"
// - "Insufficient storage space for migration"
// - "Project has no pages to migrate"
// Good readiness: Empty array (no issues)
repeated string issues = 7;
// Human-readable migration readiness assessment
// Examples: "READY", "READY_WITH_WARNINGS", "NOT_READY", "ALREADY_MIGRATED"
// Good readiness: "READY" or "READY_WITH_WARNINGS"
string readiness_status = 8;
// Detailed explanation of readiness status
// Provides context and recommendations
// Example: "Project is ready to migrate. Found 3 input files and 45 pages.
// Estimated time: 2 minutes. No blockers detected."
string readiness_message = 9;
}
// Enum for project structure version
enum ProjectStructureVersion {
PROJECT_STRUCTURE_VERSION_UNKNOWN = 0;
PROJECT_STRUCTURE_VERSION_LEGACY = 1; // Only pages/
PROJECT_STRUCTURE_VERSION_TRANSITIONAL = 2; // Both pages/ and files/
PROJECT_STRUCTURE_VERSION_MODERN = 3; // Only files/
}
Add New RPCs to ArchitecturalPlanServiceβ
service ArchitecturalPlanService {
// ... existing RPCs ...
// Migrates a legacy project to new file structure
// Requires OWNER permissions
rpc MigrateProjectFileStructure(MigrateProjectFileStructureRequest)
returns (MigrateProjectFileStructureResponse) {
option (google.api.http) = {
post: "/v1/architectural-plans/{project_id}/migrate-file-structure"
body: "*"
};
}
// Analyzes a project's migration readiness
rpc AnalyzeProjectMigration(AnalyzeProjectMigrationRequest)
returns (AnalyzeProjectMigrationResponse) {
option (google.api.http) = {
get: "/v1/architectural-plans/{project_id}/migration-analysis"
};
}
}
Update Existing RPC Request Messages (Backward Compatible)β
Critical Update: Existing page-related RPCs must be extended to support the new file structure while maintaining backward compatibility with legacy projects.
Strategy: Optional file_id Fieldβ
Add an optional file_id field to all page-related request messages. This allows:
- β
Modern projects: Pass
file_idfor direct page access infiles/{file_id}/pages/ - β
Legacy projects: Omit
file_id, system usesProjectPathResolverfor fallback topages/ - β Zero breaking changes: Existing clients continue to work without modifications
Request Messages Requiring Updatesβ
1. Code Applicability Analysis (api.proto)
message GetApplicableCodeSectionsRequest {
// The unique identifier of the architectural plan to analyze.
string architectural_plan_id = 1;
// The page number of the architectural plan to analyze.
int32 page_number = 2;
string icc_book_id = 3; // Example: 2217 for ICC IBC 2021
// NEW FIELD: Optional file ID for direct file access in modern structure
// If provided, page is accessed via files/{file_id}/pages/{page_number}/
// If omitted, system uses ProjectPathResolver to check files/ first, then pages/ (legacy)
// Example: "1", "2", "3" (auto-incrementing IDs)
string file_id = 4 [deprecated = false]; // Optional, for modern structure support
}
2. Compliance Report Generation (plan.reviewer.proto)
message GetPageSectionComplianceReportRequest {
string architectural_plan_id = 1;
int32 page_number = 2;
string icc_book_id = 3;
string icc_section_id = 4;
// NEW FIELD: Optional file ID for hierarchical file structure
string file_id = 5 [deprecated = false]; // Optional
}
message GetPageComplianceReportRequest {
string architectural_plan_id = 1;
int32 page_number = 2;
string icc_book_id = 3;
// NEW FIELD: Optional file ID for hierarchical file structure
string file_id = 4 [deprecated = false]; // Optional
}
3. Async Compliance Report Task (compliance_report.proto)
message StartPageSectionComplianceReportTaskRequest {
string architectural_plan_id = 1;
int32 page_number = 2;
string icc_book_id = 3;
string icc_section_id = 4;
// NEW FIELD: Optional file ID for hierarchical file structure
string file_id = 5 [deprecated = false]; // Optional
}
4. Analysis Availability Check (analysis_availability.proto)
message GetAvailableAnalysisRequest {
string project_id = 1;
int32 page_number = 2;
// NEW FIELD: Optional file ID for hierarchical file structure
string file_id = 3 [deprecated = false]; // Optional
}
5. File Ingestion Response (api.proto)
Note: IngestFileIntoProjectRequest already has filename and doesn't need file_id as input. However, the response should return the assigned file_id:
message IngestFileIntoProjectResponse {
string project_id = 1;
string filename = 2;
int32 pages_processed = 3;
bool success = 4;
// NEW FIELD: Assigned file ID for the ingested file
// This allows UI to immediately navigate to files/{file_id}/ structure
// Example: "1", "2", "3"
string file_id = 5; // REQUIRED in modern projects
}
Similarly for StartAsyncIngestFileResponse (task.proto):
message StartAsyncIngestFileResponse {
string task_id = 1;
string project_id = 2;
string filename = 3;
int32 page_number = 4;
bool success = 5;
string message = 6;
string completed_at = 7;
// NEW FIELD: Assigned file ID for the ingested file
string file_id = 8; // REQUIRED when ingestion completes
}
Backend Service Implementation Patternβ
When processing requests with the new optional file_id:
public PageApplicabilityAnalysisList getApplicableCodeSections(
GetApplicableCodeSectionsRequest request) {
String projectId = request.getArchitecturalPlanId();
int pageNumber = request.getPageNumber();
String fileId = request.getFileId(); // May be empty/null
// Resolve page path using ProjectPathResolver (handles file_id automatically)
// - If fileId provided β Direct path (fast, no filesystem checks)
// - If fileId null/empty β Dual-read logic (cache β modern β legacy)
String pagePath = pathResolver.resolvePageFolderPath(projectId, pageNumber, fileId);
// Continue with existing logic using resolved path
// ...
}
Key Benefits:
- β
Single source of truth: All path resolution logic centralized in
ProjectPathResolver - β
Automatic optimization: Fast path when
file_idprovided, dual-read when not - β Consistent behavior: Same logic across all services
- β
Easy testing: Mock
ProjectPathResolverfor unit tests
CLI Updates Requiredβ
The CLI commands (grpcurl, custom scripts) must also be updated to support the new optional parameter:
Example: Legacy CLI call (still works)
grpcurl -d '{
"architectural_plan_id": "project-123",
"page_number": 5,
"icc_book_id": "2217"
}' \
localhost:8080 ArchitecturalPlanReviewService/GetApplicableCodeSections
Example: Modern CLI call (with file_id)
grpcurl -d '{
"architectural_plan_id": "project-123",
"page_number": 5,
"icc_book_id": "2217",
"file_id": "2"
}' \
localhost:8080 ArchitecturalPlanReviewService/GetApplicableCodeSections
Frontend/UI Updates Requiredβ
1. Update API Service Clients (e.g., web-ng-m3/src/app/shared/api.service.ts):
getApplicableCodeSections(
projectId: string,
pageNumber: number,
iccBookId: string,
fileId?: string // NEW optional parameter
): Observable<PageApplicabilityAnalysisList> {
const request: GetApplicableCodeSectionsRequest = {
architectural_plan_id: projectId,
page_number: pageNumber,
icc_book_id: iccBookId
};
// Include file_id only if available (modern projects)
if (fileId) {
request.file_id = fileId;
}
return this.grpcClient.getApplicableCodeSections(request);
}
2. Pass file_id from Components:
When displaying page-specific analysis, components need to know which file the page belongs to. This information comes from:
- Modern projects:
InputFileMetadata.file_idandInputFileMetadata.extracted_pages - Legacy projects:
file_idis undefined/null, system falls back automatically
// In compliance.component.ts or similar
loadPageAnalysis(pageNumber: number) {
const fileId = this.getFileIdForPage(pageNumber); // NEW method
this.apiService.getApplicableCodeSections(
this.projectId,
pageNumber,
this.iccBookId,
fileId // Pass file_id if available
).subscribe(/* ... */);
}
private getFileIdForPage(pageNumber: number): string | undefined {
// Look up file_id from InputFileMetadata list
const fileMetadata = this.inputFiles.find(
f => f.extracted_pages.includes(String(pageNumber))
);
return fileMetadata?.file_id;
}
Testing Backward Compatibilityβ
Test Cases:
-
Legacy Project (no file_id):
- β
Request without
file_idβ System usesProjectPathResolverβ Falls back topages/β Success
- β
Request without
-
Modern Project (with file_id):
- β
Request with
file_idβ Direct access tofiles/{file_id}/pages/β Success
- β
Request with
-
Modern Project (omit file_id):
- β
Request without
file_idβProjectPathResolverchecksfiles/first β Success
- β
Request without
-
Invalid file_id:
- β Request with invalid
file_idβ 404 Page Not Found (expected behavior)
- β Request with invalid
Migration Impactβ
Phase 1: Deploy Proto Changes
- Add optional
file_idfields to all request messages - Deploy backend changes (proto regeneration)
- No frontend changes yet β Existing clients continue working (backward compatible)
Phase 2: Update Backend Services
- Modify service implementations to honor
file_idwhen provided - Maintain fallback behavior via
ProjectPathResolver - No frontend changes yet β Still backward compatible
Phase 3: Update Frontend (Optional)
- Add
file_idtracking in UI state - Pass
file_idin modern projects for performance optimization - Legacy projects continue working without changes
Performance Considerationsβ
With file_id (modern projects):
- β Direct path access: No filesystem checks needed
- β
No cache lookups: Skip
ProjectPathResolvercache - β Faster response: ~50-100ms saved per request
Without file_id (legacy or omitted):
- β οΈ ProjectPathResolver overhead: Cache check + potential filesystem existence checks
- β οΈ Acceptable performance:
<10ms overhead for cached paths,<100ms for uncached
Recommendation: Frontend should pass file_id when available for optimal performance.
Backend Implementationβ
1. ProjectPathResolverβ
Purpose: Resolves file paths transparently across both legacy (pages/) and modern (files/{file_id}/pages/) structures, providing backward compatibility during migration.
Location: src/main/java/org/codetricks/construction/code/assistant/ProjectPathResolver.java
package org.codetricks.construction.code.assistant;
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
import java.io.IOException;
import java.util.Optional;
import java.util.concurrent.TimeUnit;
import java.util.logging.Logger;
/**
* Resolves file paths across both modern (files/{file_id}/pages/) and
* legacy (pages/) project structures with transparent fallback.
*
* <p>Path Resolution Strategy:
* 1. Check in-memory cache first (avoid repeated filesystem checks)
* 2. Try modern structure: files/{file_id}/pages/{page_number}/
* 3. Fall back to legacy: pages/{page_number}/
* 4. Cache the result for future reads
*
* <p>Thread-safe and optimized for read-heavy workloads.
*
* <p>Instantiation: Create via constructor, pass to service implementations
*/
public class ProjectPathResolver {
private static final Logger logger = Logger.getLogger(
ProjectPathResolver.class.getName());
private final FileSystemHandler fileSystemHandler;
// Cache: projectId + pageNumber -> resolved path
private final Cache<String, String> pathCache;
public ProjectPathResolver(FileSystemHandler fileSystemHandler) {
this.fileSystemHandler = fileSystemHandler;
this.pathCache = CacheBuilder.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(1, TimeUnit.HOURS)
.build();
}
/**
* Resolves the page folder path with optional file ID for performance optimization.
*
* <p><b>Path Resolution Strategy:</b>
* <ul>
* <li>If fileId provided: Direct path construction (FAST - no filesystem checks)</li>
* <li>If fileId null/empty: Dual-read logic (check cache β modern β legacy)</li>
* </ul>
*
* @param projectId The unique identifier of the project
* @param pageNumber The page number (1-based)
* @param fileId Optional file ID for direct access (null or empty for auto-detect)
* @return The resolved page folder path
* @throws PageNotFoundException if page doesn't exist in either structure
*/
public String resolvePageFolderPath(String projectId, int pageNumber, String fileId)
throws PageNotFoundException {
// Fast path: If file ID is provided, construct path directly
if (fileId != null && !fileId.isEmpty()) {
return String.format("projects/%s/files/%s/pages/%03d",
projectId, fileId, pageNumber);
}
// Slow path: Auto-detect structure with caching
String cacheKey = getCacheKey(projectId, pageNumber);
// Check cache first
String cachedPath = pathCache.getIfPresent(cacheKey);
if (cachedPath != null) {
return cachedPath;
}
// Try legacy structure only (modern projects MUST provide file_id)
// Page numbers in modern projects are file-scoped, so we can't auto-detect
try {
String legacyPath = buildLegacyPageFolderPath(projectId, pageNumber);
if (fileSystemHandler.exists(legacyPath)) {
pathCache.put(cacheKey, legacyPath);
logger.info(String.format(
"Using legacy page path for project %s, page %d: %s",
projectId, pageNumber, legacyPath));
return legacyPath;
}
// Not found in legacy structure
throw new PageNotFoundException(projectId, pageNumber,
"Page not found in legacy structure. Modern projects require file_id parameter.");
} catch (IOException e) {
throw new PageNotFoundException(projectId, pageNumber, e);
}
}
/**
* Convenience overload for backward compatibility.
* Delegates to main method with fileId = null.
*/
public String resolvePageFolderPath(String projectId, int pageNumber)
throws PageNotFoundException {
return resolvePageFolderPath(projectId, pageNumber, null);
}
/**
* Detects the project structure version.
*
* @param projectId The unique identifier of the project
* @return The detected project structure version
*/
public ProjectStructureVersion detectProjectVersion(String projectId) throws IOException {
boolean hasLegacyPages = fileSystemHandler.exists(
String.format("projects/%s/pages/", projectId));
boolean hasFiles = fileSystemHandler.exists(
String.format("projects/%s/files/", projectId));
if (hasFiles && !hasLegacyPages) {
return ProjectStructureVersion.MODERN;
} else if (hasFiles && hasLegacyPages) {
return ProjectStructureVersion.TRANSITIONAL;
} else if (hasLegacyPages) {
return ProjectStructureVersion.LEGACY;
} else {
return ProjectStructureVersion.UNKNOWN;
}
}
/**
* Clears the path cache for a specific project.
* Call this after migration to ensure fresh path resolution.
*/
public void clearCacheForProject(String projectId) {
pathCache.invalidateAll();
logger.info("Cleared path cache for project: " + projectId);
}
// Private helper methods
private String getCacheKey(String projectId, int pageNumber) {
return projectId + ":" + pageNumber;
}
public enum ProjectStructureVersion {
UNKNOWN,
LEGACY, // Only has pages/
TRANSITIONAL, // Has both pages/ and files/
MODERN // Only has files/
}
public static class PageNotFoundException extends Exception {
public PageNotFoundException(String projectId, int pageNumber) {
super(String.format("Page %d not found in project %s", pageNumber, projectId));
}
}
}
2. InputFileMetadataServiceβ
Purpose: Generate, retrieve, and manage file metadata
Location: src/main/java/org/codetricks/construction/code/assistant/service/InputFileMetadataService.java
package org.codetricks.construction.code.assistant.service;
import com.google.protobuf.util.JsonFormat;
import org.codetricks.construction.code.assistant.FileSystemHandler;
import java.io.IOException;
import java.security.MessageDigest;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Logger;
/**
* Service for managing Project Input File metadata and the hierarchical file structure.
*
* <p>Handles metadata generation, persistence, and retrieval for user-uploaded input files
* (PDFs, images, etc.) that get processed into individual pages.
*
* <p><b>GCS Path Structure:</b>
* <ul>
* <li>Input files: {@code projects/{projectId}/inputs/filename.pdf}</li>
* <li>File metadata: {@code projects/{projectId}/files/{file_id}/metadata.json}</li>
* <li>Extracted pages: {@code projects/{projectId}/files/{file_id}/pages/{page_num}/}</li>
* <li>Legacy pages: {@code projects/{projectId}/pages/{page_num}/} (backward compatibility)</li>
* <li>File index: {@code projects/{projectId}/files/index.json} (file ID counter + mappings)</li>
* </ul>
*
* <p><b>Responsibilities:</b>
* <ul>
* <li>Generate rich metadata (InputFileMetadata proto) for uploaded files</li>
* <li>Assign auto-incrementing file IDs via {@code files/index.json}</li>
* <li>Classify document types (plans, specifications, reports, etc.)</li>
* <li>Persist and retrieve metadata from GCS</li>
* <li>Track processing status and page associations</li>
* </ul>
*
* <p>Thread-safe and idempotent.
*
* <p>Instantiation: Create via constructor, pass FileSystemHandler and DocumentClassificationService
*/
public class InputFileMetadataService {
private static final Logger logger = Logger.getLogger(
InputFileMetadataService.class.getName());
private final FileSystemHandler fileSystemHandler;
private final DocumentClassificationService classificationService;
public InputFileMetadataService(
FileSystemHandler fileSystemHandler,
DocumentClassificationService classificationService) {
this.fileSystemHandler = fileSystemHandler;
this.classificationService = classificationService;
}
/**
* Generates comprehensive metadata for an input file.
*
* @param projectId The project ID
* @param inputFilePath Path to the input file (e.g., "inputs/plans.pdf")
* @param forceRegenerate Whether to overwrite existing metadata
* @return Generated metadata
*/
public InputFileMetadata generateMetadata(
String projectId,
String inputFilePath,
boolean forceRegenerate) throws IOException {
String fullPath = String.format("projects/%s/%s", projectId, inputFilePath);
// Check if metadata already exists
String fileId = extractOrGenerateFileId(projectId, inputFilePath);
String metadataPath = getMetadataPath(projectId, fileId);
if (!forceRegenerate && fileSystemHandler.exists(metadataPath)) {
logger.info("Metadata already exists for file: " + inputFilePath);
return loadMetadata(projectId, fileId);
}
logger.info("Generating metadata for file: " + inputFilePath);
// Build metadata
InputFileMetadata.Builder builder = InputFileMetadata.newBuilder()
.setFileId(fileId)
.setFileName(extractFileName(inputFilePath))
.setFilePath(inputFilePath)
.setMimeType(detectMimeType(fullPath))
.setFileSizeBytes(fileSystemHandler.getFileSize(fullPath))
.setUploadDate(com.google.protobuf.Timestamp.newBuilder()
.setSeconds(Instant.now().getEpochSecond())
.build())
.setProcessingStatus(ProcessingStatus.PROCESSING_STATUS_UPLOADED);
// For PDF files, extract page count
if (fullPath.endsWith(".pdf")) {
int pageCount = extractPageCount(fullPath);
builder.setPageCount(pageCount);
}
// Classify document type (heuristic-based for now, AI later)
DocumentType docType = classificationService.classifyDocument(
builder.getFileName(), fullPath);
builder.setDocumentType(docType);
// Generate checksum
String checksum = generateChecksum(fullPath);
builder.setChecksumMd5(checksum);
InputFileMetadata metadata = builder.build();
// Save metadata to disk
saveMetadata(projectId, fileId, metadata);
logger.info("Generated metadata for file: " + fileId);
return metadata;
}
/**
* Loads existing metadata from disk.
*/
public InputFileMetadata loadMetadata(String projectId, String fileId)
throws IOException {
String metadataPath = getMetadataPath(projectId, fileId);
String metadataJson = fileSystemHandler.readFileAsString(metadataPath);
InputFileMetadata.Builder builder = InputFileMetadata.newBuilder();
JsonFormat.parser().merge(metadataJson, builder);
return builder.build();
}
/**
* Updates metadata with processing results.
*/
public InputFileMetadata updateProcessingStatus(
String projectId,
String fileId,
ProcessingStatus status,
List<String> extractedPages) throws IOException {
InputFileMetadata existing = loadMetadata(projectId, fileId);
InputFileMetadata.Builder builder = existing.toBuilder()
.setProcessingStatus(status)
.setProcessedDate(com.google.protobuf.Timestamp.newBuilder()
.setSeconds(Instant.now().getEpochSecond())
.build())
.clearExtractedPages()
.addAllExtractedPages(extractedPages);
InputFileMetadata updated = builder.build();
saveMetadata(projectId, fileId, updated);
return updated;
}
/**
* Lists all file metadata in a project.
*/
public List<InputFileMetadata> listAllMetadata(String projectId)
throws IOException {
String filesBasePath = String.format("projects/%s/files/", projectId);
if (!fileSystemHandler.exists(filesBasePath)) {
return new ArrayList<>();
}
List<String> fileIds = fileSystemHandler.listDirectories(filesBasePath);
List<InputFileMetadata> metadataList = new ArrayList<>();
for (String fileId : fileIds) {
try {
InputFileMetadata metadata = loadMetadata(projectId, fileId);
metadataList.add(metadata);
} catch (IOException e) {
logger.warning("Failed to load metadata for file: " + fileId);
}
}
return metadataList;
}
// Private helper methods
/**
* Generates a new file ID using auto-increment counter.
* Maintains counter in projects/{projectId}/files/index.json
*/
private String extractOrGenerateFileId(String projectId, String inputFilePath)
throws IOException {
String indexPath = String.format("projects/%s/files/index.json", projectId);
// Load index or create new one
FileIndex index;
if (fileSystemHandler.exists(indexPath)) {
String indexJson = fileSystemHandler.readFileAsString(indexPath);
index = new Gson().fromJson(indexJson, FileIndex.class);
} else {
index = new FileIndex();
index.nextFileId = 1;
index.files = new ArrayList<>();
}
// Generate new file ID
String fileId = String.valueOf(index.nextFileId);
index.nextFileId++;
// Add to index
index.files.add(new FileIndexEntry(fileId, extractFileName(inputFilePath)));
// Save index
String updatedJson = new Gson().toJson(index);
fileSystemHandler.writeFile(indexPath, updatedJson);
return fileId;
}
// Helper classes for file index
private static class FileIndex {
int nextFileId;
List<FileIndexEntry> files;
}
private static class FileIndexEntry {
String fileId;
String fileName;
FileIndexEntry(String fileId, String fileName) {
this.fileId = fileId;
this.fileName = fileName;
}
}
private String getMetadataPath(String projectId, String fileId) {
return String.format("projects/%s/files/%s/metadata.json", projectId, fileId);
}
private String extractFileName(String filePath) {
int lastSlash = filePath.lastIndexOf('/');
return lastSlash >= 0 ? filePath.substring(lastSlash + 1) : filePath;
}
private String detectMimeType(String filePath) {
if (filePath.endsWith(".pdf")) {
return "application/pdf";
}
return "application/octet-stream";
}
private int extractPageCount(String pdfPath) throws IOException {
// Use Apache PDFBox to get page count
try (org.apache.pdfbox.pdmodel.PDDocument document =
org.apache.pdfbox.pdmodel.PDDocument.load(
fileSystemHandler.readFileAsBytes(pdfPath))) {
return document.getNumberOfPages();
}
}
private String generateChecksum(String filePath) throws IOException {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] fileBytes = fileSystemHandler.readFileAsBytes(filePath);
byte[] hashBytes = md.digest(fileBytes);
StringBuilder sb = new StringBuilder();
for (byte b : hashBytes) {
sb.append(String.format("%02x", b));
}
return sb.toString();
} catch (Exception e) {
logger.warning("Failed to generate checksum: " + e.getMessage());
return "";
}
}
private void saveMetadata(
String projectId,
String fileId,
InputFileMetadata metadata) throws IOException {
String metadataPath = getMetadataPath(projectId, fileId);
String metadataJson = JsonFormat.printer()
.preservingProtoFieldNames()
.print(metadata);
fileSystemHandler.writeFile(metadataPath, metadataJson);
}
}
3. Migration Readiness Assessmentβ
Purpose: Determine if a project is safe to migrate and provide actionable recommendations
Readiness Status Valuesβ
| Status | Condition | Can Migrate? | Description |
|---|---|---|---|
READY | Legacy project, has input files and pages, no issues | β Yes | Ideal state for migration |
READY_WITH_WARNINGS | Legacy project, has pages, but minor issues (e.g., no input files) | β Yes | Can proceed but review warnings |
NOT_READY | Legacy project but critical blockers (e.g., no pages) | β No | Fix issues before migrating |
ALREADY_MIGRATED | Modern structure detected | N/A | No action needed |
MIGRATION_IN_PROGRESS | Transitional state (both structures exist) | β οΈ Caution | Likely interrupted migration |
Assessment Logicβ
// Pseudo-code for readiness assessment
if (currentVersion == MODERN) {
return "ALREADY_MIGRATED";
} else if (currentVersion == TRANSITIONAL) {
return "MIGRATION_IN_PROGRESS"; // Investigate before retrying
} else if (legacyPageCount == 0) {
return "NOT_READY"; // Nothing to migrate
} else if (inputFileCount == 0) {
return "READY_WITH_WARNINGS"; // Will create default file entry
} else {
return "READY"; // Green light!
}
Common Issues and Resolutionsβ
| Issue | Severity | Resolution |
|---|---|---|
No input files in inputs/ | Warning | Create default file entry with ID unknown-source |
No pages in pages/ | Blocker | Cannot migrate empty project |
| Page numbering gaps | Warning | Proceed anyway, gaps will be preserved |
| Both structures exist | Warning | Likely interrupted migration, investigate before retrying |
| Insufficient storage | Blocker | Free up space or increase quota |
4. FileStructureMigrationServiceβ
Purpose: Migrate legacy projects to new structure
Location: src/main/java/org/codetricks/construction/code/assistant/service/FileStructureMigrationService.java
package org.codetricks.construction.code.assistant.service;
import java.io.IOException;
import java.util.*;
import java.util.logging.Logger;
/**
* Service for migrating legacy projects from flat pages/ structure to
* hierarchical files/{file_id}/pages/ structure.
*
* <p>Migration Strategy:
* 1. Analyze inputs/ folder to identify source files
* 2. Generate file metadata for each input file
* 3. Associate existing pages with source files (best effort heuristic)
* 4. Copy pages to new files/{file_id}/pages/ structure
* 5. Keep legacy pages/ intact for rollback
* 6. Update plan-metadata.json with new paths
*
* <p>Thread-safe and idempotent.
*
* <p>Instantiation: Create via constructor, pass FileSystemHandler,
* InputFileMetadataService, and ProjectPathResolver
*/
public class FileStructureMigrationService {
private static final Logger logger = Logger.getLogger(
FileStructureMigrationService.class.getName());
private final FileSystemHandler fileSystemHandler;
private final InputFileMetadataService metadataService;
private final ProjectPathResolver pathResolver;
public FileStructureMigrationService(
FileSystemHandler fileSystemHandler,
InputFileMetadataService metadataService,
ProjectPathResolver pathResolver) {
this.fileSystemHandler = fileSystemHandler;
this.metadataService = metadataService;
this.pathResolver = pathResolver;
}
/**
* Analyzes a project to determine migration readiness.
* Provides comprehensive assessment including blockers, estimates, and recommendations.
*/
public MigrationAnalysis analyzeProject(String projectId) throws IOException {
logger.info("Analyzing project for migration: " + projectId);
MigrationAnalysis analysis = new MigrationAnalysis();
analysis.projectId = projectId;
analysis.currentVersion = dualReadHandler.detectProjectVersion(projectId);
analysis.issues = new ArrayList<>();
// Count input files
String inputsPath = String.format("projects/%s/inputs/", projectId);
if (fileSystemHandler.exists(inputsPath)) {
analysis.inputFileCount = fileSystemHandler.listFiles(inputsPath).size();
}
// Count legacy pages
String pagesPath = String.format("projects/%s/pages/", projectId);
if (fileSystemHandler.exists(pagesPath)) {
analysis.legacyPageCount = fileSystemHandler.listDirectories(pagesPath).size();
}
// Determine if migration is needed
analysis.needsMigration = (analysis.currentVersion ==
ProjectPathResolver.ProjectStructureVersion.LEGACY);
// Estimate duration (rough estimate: 1 second per page + 5 seconds per file)
analysis.estimatedDurationSeconds =
(analysis.legacyPageCount * 1) + (analysis.inputFileCount * 5);
// Check for blockers and warnings
if (analysis.inputFileCount == 0) {
analysis.issues.add("No input files found in inputs/ folder - will create default file entry");
}
if (analysis.legacyPageCount == 0) {
analysis.issues.add("No pages found in pages/ folder - nothing to migrate");
}
// Assess readiness status
if (analysis.currentVersion == ProjectPathResolver.ProjectStructureVersion.MODERN) {
analysis.readinessStatus = "ALREADY_MIGRATED";
analysis.readinessMessage = "Project has already been migrated to the new file structure.";
} else if (analysis.currentVersion == ProjectPathResolver.ProjectStructureVersion.TRANSITIONAL) {
analysis.readinessStatus = "MIGRATION_IN_PROGRESS";
analysis.readinessMessage = "Project migration is in progress or partially complete. " +
"Both legacy and modern structures exist.";
} else if (!analysis.issues.isEmpty() && analysis.legacyPageCount == 0) {
analysis.readinessStatus = "NOT_READY";
analysis.readinessMessage = "Project cannot be migrated: no pages found.";
} else if (!analysis.issues.isEmpty()) {
analysis.readinessStatus = "READY_WITH_WARNINGS";
analysis.readinessMessage = String.format(
"Project can be migrated with warnings. Found %d input files and %d pages. " +
"Estimated time: %d seconds. Issues: %s",
analysis.inputFileCount, analysis.legacyPageCount,
analysis.estimatedDurationSeconds, String.join("; ", analysis.issues));
} else {
analysis.readinessStatus = "READY";
analysis.readinessMessage = String.format(
"Project is ready to migrate. Found %d input files and %d pages. " +
"Estimated time: %d seconds. No blockers detected.",
analysis.inputFileCount, analysis.legacyPageCount,
analysis.estimatedDurationSeconds);
}
logger.info("Analysis complete: " + analysis);
return analysis;
}
/**
* Migrates a project to new file structure.
*
* @param projectId The project to migrate
* @param preserveLegacy Whether to keep pages/ folder after migration
* @param dryRun If true, only preview changes without applying
* @return Migration result
*/
public MigrationResult migrateProject(
String projectId,
boolean preserveLegacy,
boolean dryRun) throws IOException {
logger.info(String.format(
"Starting migration for project %s (dryRun=%s, preserve=%s)",
projectId, dryRun, preserveLegacy));
MigrationResult result = new MigrationResult();
result.projectId = projectId;
result.startTime = System.currentTimeMillis();
try {
// Step 1: Analyze input files
List<String> inputFiles = discoverInputFiles(projectId);
logger.info("Discovered " + inputFiles.size() + " input files");
if (inputFiles.isEmpty()) {
// No input files - create a default file for all pages
inputFiles.add(createDefaultFileEntry(projectId));
}
// Step 2: Generate metadata for each input file
Map<String, InputFileMetadata> fileMetadataMap = new HashMap<>();
for (String inputFile : inputFiles) {
InputFileMetadata metadata = metadataService.generateMetadata(
projectId, inputFile, false /* don't force regenerate */);
fileMetadataMap.put(metadata.getFileId(), metadata);
result.migratedFiles.add(metadata);
}
// Step 3: Associate pages with files (heuristic-based)
Map<String, List<Integer>> fileToPages = associatePagesWithFiles(
projectId, fileMetadataMap);
// Step 4: Migrate pages to new structure
for (Map.Entry<String, List<Integer>> entry : fileToPages.entrySet()) {
String fileId = entry.getKey();
List<Integer> pageNumbers = entry.getValue();
for (int pageNumber : pageNumbers) {
if (!dryRun) {
migratePageToNewStructure(projectId, fileId, pageNumber);
}
result.totalPagesMigrated++;
}
result.pagesPerFile.put(fileId, pageNumbers.size());
}
// Step 5: Update metadata with extracted pages
if (!dryRun) {
for (Map.Entry<String, List<Integer>> entry : fileToPages.entrySet()) {
String fileId = entry.getKey();
List<String> pageIds = entry.getValue().stream()
.map(String::valueOf)
.toList();
metadataService.updateProcessingStatus(
projectId, fileId, ProcessingStatus.PROCESSING_STATUS_COMPLETED, pageIds);
}
}
// Step 6: Optionally remove legacy pages/ folder
if (!preserveLegacy && !dryRun) {
String legacyPagesPath = String.format("projects/%s/pages/", projectId);
fileSystemHandler.deleteDirectory(legacyPagesPath);
logger.info("Removed legacy pages/ folder");
}
// Step 7: Clear path cache to force re-resolution
if (!dryRun) {
dualReadHandler.clearCacheForProject(projectId);
}
result.success = true;
logger.info("Migration completed successfully");
} catch (Exception e) {
result.success = false;
result.errorMessage = e.getMessage();
logger.severe("Migration failed: " + e.getMessage());
e.printStackTrace();
}
result.endTime = System.currentTimeMillis();
return result;
}
// Private helper methods
private List<String> discoverInputFiles(String projectId) throws IOException {
String inputsPath = String.format("projects/%s/inputs/", projectId);
if (!fileSystemHandler.exists(inputsPath)) {
return new ArrayList<>();
}
return fileSystemHandler.listFiles(inputsPath).stream()
.map(filename -> "inputs/" + filename)
.toList();
}
private String createDefaultFileEntry(String projectId) {
// For projects with no input files, create a placeholder
return "inputs/unknown-source.pdf";
}
private Map<String, List<Integer>> associatePagesWithFiles(
String projectId,
Map<String, InputFileMetadata> fileMetadataMap) throws IOException {
// Simple heuristic: Distribute pages evenly across files based on page count
Map<String, List<Integer>> fileToPages = new HashMap<>();
// Get list of all legacy pages
String legacyPagesPath = String.format("projects/%s/pages/", projectId);
List<Integer> allPages = fileSystemHandler.listDirectories(legacyPagesPath).stream()
.map(Integer::parseInt)
.sorted()
.toList();
if (allPages.isEmpty()) {
return fileToPages;
}
// If only one file, assign all pages to it
if (fileMetadataMap.size() == 1) {
String fileId = fileMetadataMap.keySet().iterator().next();
fileToPages.put(fileId, new ArrayList<>(allPages));
return fileToPages;
}
// Otherwise, distribute based on page count in each file
List<InputFileMetadata> sortedFiles = fileMetadataMap.values().stream()
.sorted(Comparator.comparingInt(InputFileMetadata::getPageCount))
.toList();
int currentPageIndex = 0;
for (InputFileMetadata metadata : sortedFiles) {
int pageCount = metadata.getPageCount();
List<Integer> assignedPages = new ArrayList<>();
for (int i = 0; i < pageCount && currentPageIndex < allPages.size(); i++) {
assignedPages.add(allPages.get(currentPageIndex++));
}
fileToPages.put(metadata.getFileId(), assignedPages);
}
// Assign remaining pages to last file (edge case)
if (currentPageIndex < allPages.size()) {
String lastFileId = sortedFiles.get(sortedFiles.size() - 1).getFileId();
List<Integer> lastFilePages = fileToPages.get(lastFileId);
while (currentPageIndex < allPages.size()) {
lastFilePages.add(allPages.get(currentPageIndex++));
}
}
return fileToPages;
}
private void migratePageToNewStructure(
String projectId,
String fileId,
int pageNumber) throws IOException {
String legacyPath = String.format("projects/%s/pages/%03d/", projectId, pageNumber);
String newPath = String.format("projects/%s/files/%s/pages/%03d/",
projectId, fileId, pageNumber);
// Copy entire page folder to new location
fileSystemHandler.copyDirectory(legacyPath, newPath);
logger.info(String.format("Migrated page %d to file %s", pageNumber, fileId));
}
// Data classes
public static class MigrationAnalysis {
public String projectId;
public ProjectPathResolver.ProjectStructureVersion currentVersion;
public boolean needsMigration;
public int inputFileCount;
public int legacyPageCount;
public int estimatedDurationSeconds;
public List<String> issues;
public String readinessStatus; // READY, READY_WITH_WARNINGS, NOT_READY, ALREADY_MIGRATED
public String readinessMessage;
@Override
public String toString() {
return String.format(
"MigrationAnalysis{project=%s, version=%s, readiness=%s, " +
"input_files=%d, pages=%d, est_duration=%ds, issues=%d}",
projectId, currentVersion, readinessStatus,
inputFileCount, legacyPageCount, estimatedDurationSeconds,
issues != null ? issues.size() : 0);
}
}
public static class MigrationResult {
public String projectId;
public boolean success;
public List<InputFileMetadata> migratedFiles = new ArrayList<>();
public Map<String, Integer> pagesPerFile = new HashMap<>();
public int totalPagesMigrated;
public String errorMessage;
public long startTime;
public long endTime;
public long getDurationSeconds() {
return (endTime - startTime) / 1000;
}
}
}
5. DocumentClassificationServiceβ
Purpose: Classify document type (heuristic-based initially)
Location: src/main/java/org/codetricks/construction/code/assistant/service/DocumentClassificationService.java
package org.codetricks.construction.code.assistant.service;
import java.util.regex.Pattern;
/**
* Service for classifying document types based on filename and content.
* Uses heuristic rules initially, can be enhanced with AI classification later.
*
* <p>Instantiation: Stateless utility, can be instantiated with default constructor
*/
public class DocumentClassificationService {
// Patterns for document type detection
private static final Pattern ARCHITECTURAL_PATTERN = Pattern.compile(
"(?i).*(architectural|arch|floor[\\s-]?plan|site[\\s-]?plan|elevation|section).*");
private static final Pattern ELECTRICAL_PATTERN = Pattern.compile(
"(?i).*(electrical|elec|power|lighting).*");
private static final Pattern MECHANICAL_PATTERN = Pattern.compile(
"(?i).*(mechanical|mech|hvac|plumbing|mep).*");
private static final Pattern STRUCTURAL_PATTERN = Pattern.compile(
"(?i).*(structural|struct|foundation|framing).*");
private static final Pattern PERMIT_PATTERN = Pattern.compile(
"(?i).*(permit|application|approval).*");
private static final Pattern INSPECTION_PATTERN = Pattern.compile(
"(?i).*(inspector|inspection|feedback|corrections).*");
/**
* Classifies a document based on filename and optional content analysis.
*
* @param filename The name of the file
* @param filePath Optional path to file for content analysis (future enhancement)
* @return The classified document type
*/
public DocumentType classifyDocument(String filename, String filePath) {
// Heuristic-based classification using filename patterns
if (ARCHITECTURAL_PATTERN.matcher(filename).matches()) {
return DocumentType.DOCUMENT_TYPE_ARCHITECTURAL_PLAN;
}
if (ELECTRICAL_PATTERN.matcher(filename).matches()) {
return DocumentType.DOCUMENT_TYPE_ELECTRICAL_PLAN;
}
if (MECHANICAL_PATTERN.matcher(filename).matches()) {
return DocumentType.DOCUMENT_TYPE_MECHANICAL_PLAN;
}
if (STRUCTURAL_PATTERN.matcher(filename).matches()) {
return DocumentType.DOCUMENT_TYPE_STRUCTURAL_PLAN;
}
if (PERMIT_PATTERN.matcher(filename).matches()) {
return DocumentType.DOCUMENT_TYPE_PERMIT_APPLICATION;
}
if (INSPECTION_PATTERN.matcher(filename).matches()) {
return DocumentType.DOCUMENT_TYPE_INSPECTOR_FEEDBACK;
}
// Default to unknown if no pattern matches
return DocumentType.DOCUMENT_TYPE_UNKNOWN;
}
// Future enhancement: AI-based classification using LLM
public DocumentType classifyDocumentWithAI(String filePath) {
// TODO: Implement LLM-based classification
// 1. Extract first page or sample of content
// 2. Call LLM with prompt: "Classify this construction document..."
// 3. Parse LLM response to DocumentType enum
// 4. Fall back to heuristic if LLM fails
throw new UnsupportedOperationException("AI classification not yet implemented");
}
}
Frontend Implementationβ
1. File Metadata List Componentβ
Purpose: Display list of files with rich metadata
Location: web-ng-m3/src/app/components/project/settings/file-metadata-list/file-metadata-list.component.ts
import { Component, Input, OnInit } from '@angular/core';
import { InputFileMetadata, DocumentType, ProcessingStatus } from '@generated/api_pb';
import { ArchitecturalPlanService } from '@app/services/architectural-plan.service';
@Component({
selector: 'app-file-metadata-list',
templateUrl: './file-metadata-list.component.html',
styleUrls: ['./file-metadata-list.component.scss']
})
export class FileMetadataListComponent implements OnInit {
@Input() projectId: string = '';
files: InputFileMetadata[] = [];
loading: boolean = true;
error: string | null = null;
// Enum references for template
DocumentType = DocumentType;
ProcessingStatus = ProcessingStatus;
constructor(private planService: ArchitecturalPlanService) {}
ngOnInit(): void {
this.loadFileMetadata();
}
private loadFileMetadata(): void {
this.loading = true;
this.error = null;
this.planService.listInputFileMetadata(this.projectId)
.subscribe({
next: (response) => {
this.files = response.files;
this.loading = false;
},
error: (err) => {
this.error = 'Failed to load file metadata';
this.loading = false;
console.error('Error loading file metadata:', err);
}
});
}
getDocumentTypeLabel(type: DocumentType): string {
switch (type) {
case DocumentType.DOCUMENT_TYPE_ARCHITECTURAL_PLAN:
return 'Architectural Plan';
case DocumentType.DOCUMENT_TYPE_ELECTRICAL_PLAN:
return 'Electrical Plan';
case DocumentType.DOCUMENT_TYPE_MECHANICAL_PLAN:
return 'Mechanical Plan';
case DocumentType.DOCUMENT_TYPE_STRUCTURAL_PLAN:
return 'Structural Plan';
case DocumentType.DOCUMENT_TYPE_INSPECTOR_FEEDBACK:
return 'Inspector Feedback';
case DocumentType.DOCUMENT_TYPE_PERMIT_APPLICATION:
return 'Permit Application';
case DocumentType.DOCUMENT_TYPE_SITE_PLAN:
return 'Site Plan';
case DocumentType.DOCUMENT_TYPE_ELEVATION_DRAWING:
return 'Elevation Drawing';
case DocumentType.DOCUMENT_TYPE_SECTION_DRAWING:
return 'Section Drawing';
default:
return 'Unknown';
}
}
getProcessingStatusLabel(status: ProcessingStatus): string {
switch (status) {
case ProcessingStatus.PROCESSING_STATUS_UPLOADED:
return 'Uploaded';
case ProcessingStatus.PROCESSING_STATUS_PROCESSING:
return 'Processing';
case ProcessingStatus.PROCESSING_STATUS_COMPLETED:
return 'Completed';
case ProcessingStatus.PROCESSING_STATUS_FAILED:
return 'Failed';
default:
return 'Unknown';
}
}
getProcessingStatusColor(status: ProcessingStatus): string {
switch (status) {
case ProcessingStatus.PROCESSING_STATUS_COMPLETED:
return 'success';
case ProcessingStatus.PROCESSING_STATUS_PROCESSING:
return 'primary';
case ProcessingStatus.PROCESSING_STATUS_FAILED:
return 'warn';
default:
return 'accent';
}
}
formatFileSize(bytes: number): string {
if (bytes === 0) return '0 B';
const k = 1024;
const sizes = ['B', 'KB', 'MB', 'GB'];
const i = Math.floor(Math.log(bytes) / Math.log(k));
return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
}
formatDate(isoDate: string): string {
if (!isoDate) return 'N/A';
return new Date(isoDate).toLocaleDateString();
}
}
Template: file-metadata-list.component.html
<div class="file-metadata-list">
<h3>Project Files</h3>
<div *ngIf="loading" class="loading-spinner">
<mat-spinner diameter="40"></mat-spinner>
<p>Loading file metadata...</p>
</div>
<div *ngIf="error" class="error-message">
<mat-icon>error</mat-icon>
<p>{{ error }}</p>
</div>
<div *ngIf="!loading && !error && files.length === 0" class="empty-state">
<mat-icon>description</mat-icon>
<p>No files found in this project.</p>
</div>
<mat-card *ngFor="let file of files" class="file-card">
<mat-card-header>
<mat-icon mat-card-avatar>description</mat-icon>
<mat-card-title>{{ file.fileName }}</mat-card-title>
<mat-card-subtitle>{{ getDocumentTypeLabel(file.documentType) }}</mat-card-subtitle>
</mat-card-header>
<mat-card-content>
<div class="file-details">
<div class="detail-row">
<span class="label">File ID:</span>
<span class="value">{{ file.fileId }}</span>
</div>
<div class="detail-row">
<span class="label">Size:</span>
<span class="value">{{ formatFileSize(file.fileSizeBytes) }}</span>
</div>
<div class="detail-row">
<span class="label">Pages:</span>
<span class="value">{{ file.pageCount }}</span>
</div>
<div class="detail-row">
<span class="label">Upload Date:</span>
<span class="value">{{ formatDate(file.uploadDate) }}</span>
</div>
<div class="detail-row">
<span class="label">Status:</span>
<mat-chip [color]="getProcessingStatusColor(file.processingStatus)" selected>
{{ getProcessingStatusLabel(file.processingStatus) }}
</mat-chip>
</div>
<div *ngIf="file.contentSummary" class="detail-row">
<span class="label">Summary:</span>
<p class="summary">{{ file.contentSummary }}</p>
</div>
</div>
</mat-card-content>
<mat-card-actions align="end">
<button mat-button color="primary">View Pages</button>
<button mat-button>Reprocess</button>
</mat-card-actions>
</mat-card>
</div>
2. Hierarchical Page Navigation Componentβ
Purpose: Display pages organized hierarchically by source file in table of contents
Location: web-ng-m3/src/app/components/project/pages/page-toc-hierarchical/page-toc-hierarchical.component.ts
UI Pattern: Mirrors the compliance tab's collapsible section hierarchy
import { Component, Input, Output, EventEmitter, OnInit } from '@angular/core';
import { InputFileMetadata, DocumentType } from '@generated/api_pb';
import { ArchitecturalPlanService } from '@app/services/architectural-plan.service';
import { trigger, state, style, transition, animate } from '@angular/animations';
export interface PageTreeNode {
type: 'file' | 'page';
fileId?: string;
fileName?: string;
documentType?: DocumentType;
pageCount?: number;
pageNumber?: number;
pageTitle?: string;
depth: number;
children?: PageTreeNode[];
}
@Component({
selector: 'app-page-toc-hierarchical',
templateUrl: './page-toc-hierarchical.component.html',
styleUrls: ['./page-toc-hierarchical.component.scss'],
animations: [
trigger('rowAnimation', [
transition(':enter', [
style({ height: '0px', opacity: 0, transform: 'translateY(-10px)', overflow: 'hidden' }),
animate('150ms ease-out', style({ height: '*', opacity: 1, transform: 'translateY(0)' }))
]),
transition(':leave', [
animate('150ms ease-in', style({ height: '0px', opacity: 0, transform: 'translateY(-10px)', overflow: 'hidden' }))
])
])
]
})
export class PageTocHierarchicalComponent implements OnInit {
@Input() projectId: string = '';
@Input() selectedPageNumber: number | null = null;
@Output() pageSelected = new EventEmitter<number>();
treeNodes: PageTreeNode[] = [];
expandedFileIds = new Set<string>();
loading: boolean = true;
constructor(private planService: ArchitecturalPlanService) {}
ngOnInit(): void {
this.loadHierarchy();
}
private async loadHierarchy(): Promise<void> {
this.loading = true;
try {
// Load file metadata
const response = await this.planService.listInputFileMetadata(this.projectId).toPromise();
const files = response?.files || [];
// Load plan pages
const plan = await this.planService.getArchitecturalPlan(this.projectId).toPromise();
const pages = plan?.pages || [];
// Build tree structure
this.treeNodes = files.map(file => ({
type: 'file' as const,
fileId: file.fileId,
fileName: file.fileName,
documentType: file.documentType,
pageCount: file.pageCount,
depth: 0,
children: pages
.filter(page => this.pagesBelongsToFile(page, file))
.map(page => ({
type: 'page' as const,
pageNumber: page.pageNumber,
pageTitle: page.title,
depth: 1
}))
}));
// Expand all by default
files.forEach(file => this.expandedFileIds.add(file.fileId));
} catch (error) {
console.error('Error loading hierarchy:', error);
} finally {
this.loading = false;
}
}
isExpanded(fileId: string): boolean {
return this.expandedFileIds.has(fileId);
}
toggleFile(fileId: string): void {
if (this.expandedFileIds.has(fileId)) {
this.expandedFileIds.delete(fileId);
} else {
this.expandedFileIds.add(fileId);
}
}
expandAll(): void {
this.treeNodes.forEach(node => {
if (node.fileId) {
this.expandedFileIds.add(node.fileId);
}
});
}
collapseAll(): void {
this.expandedFileIds.clear();
}
selectPage(pageNumber: number): void {
this.selectedPageNumber = pageNumber;
this.pageSelected.emit(pageNumber);
}
getNodePadding(depth: number): string {
return `${depth * 24}px`;
}
getDocumentTypeIcon(type: DocumentType): string {
switch (type) {
case DocumentType.DOCUMENT_TYPE_ARCHITECTURAL_PLAN:
return 'architecture';
case DocumentType.DOCUMENT_TYPE_ELECTRICAL_PLAN:
return 'electrical_services';
case DocumentType.DOCUMENT_TYPE_MECHANICAL_PLAN:
return 'hvac';
case DocumentType.DOCUMENT_TYPE_STRUCTURAL_PLAN:
return 'foundation';
default:
return 'description';
}
}
private pagesBelongsToFile(page: any, file: InputFileMetadata): boolean {
// For now, check if page number is in extracted_pages
// This will be more sophisticated once migration is complete
return file.extractedPages?.includes(page.pageNumber.toString()) || false;
}
}
Template: page-toc-hierarchical.component.html
<div class="page-toc-hierarchical">
<div class="toc-header">
<h3>Table of Contents</h3>
<div class="toc-actions">
<button mat-icon-button (click)="expandAll()" title="Expand All">
<mat-icon>unfold_more</mat-icon>
</button>
<button mat-icon-button (click)="collapseAll()" title="Collapse All">
<mat-icon>unfold_less</mat-icon>
</button>
</div>
</div>
<div *ngIf="loading" class="loading-state">
<mat-spinner diameter="30"></mat-spinner>
</div>
<div *ngIf="!loading" class="toc-tree">
<ng-container *ngFor="let node of treeNodes">
<!-- File Node (Parent) -->
<div class="tree-node file-node"
[style.padding-left]="getNodePadding(node.depth)"
(click)="toggleFile(node.fileId!)"
[@rowAnimation]>
<mat-icon class="expand-icon">
{{ isExpanded(node.fileId!) ? 'expand_more' : 'chevron_right' }}
</mat-icon>
<mat-icon class="file-icon">{{ getDocumentTypeIcon(node.documentType!) }}</mat-icon>
<span class="file-name">{{ node.fileName }}</span>
<mat-chip class="document-type-chip" size="small">
{{ getDocumentTypeLabel(node.documentType!) }}
</mat-chip>
<span class="page-count">{{ node.pageCount }} pages</span>
</div>
<!-- Page Nodes (Children) - only shown when file is expanded -->
<ng-container *ngIf="isExpanded(node.fileId!)">
<div *ngFor="let child of node.children"
class="tree-node page-node"
[class.selected]="child.pageNumber === selectedPageNumber"
[style.padding-left]="getNodePadding(child.depth)"
(click)="selectPage(child.pageNumber!)"
[@rowAnimation]>
<span class="expander-placeholder"></span>
<mat-icon class="page-icon">article</mat-icon>
<span class="page-label">Page {{ child.pageNumber }}: {{ child.pageTitle }}</span>
</div>
</ng-container>
</ng-container>
</div>
</div>
Styling (similar to compliance tab):
.page-toc-hierarchical {
.tree-node {
display: flex;
align-items: center;
padding: 8px;
cursor: pointer;
transition: background-color 150ms ease;
&:hover {
background-color: rgba(0, 0, 0, 0.04);
}
&.selected {
background-color: rgba(63, 81, 181, 0.1);
border-left: 3px solid #3f51b5;
}
}
.file-node {
font-weight: 500;
border-bottom: 1px solid rgba(0, 0, 0, 0.12);
}
.page-node {
font-weight: 400;
}
.expand-icon {
margin-right: 8px;
}
.expander-placeholder {
display: inline-block;
width: 32px;
}
.file-icon, .page-icon {
margin-right: 8px;
color: rgba(0, 0, 0, 0.54);
}
}
Integration: This component replaces or enhances the existing flat page list in the TOC sidebar.
3. Legacy Project Upgrade Banner Componentβ
Purpose: Prompt users to upgrade legacy projects
Location: web-ng-m3/src/app/components/project/settings/legacy-upgrade-banner/legacy-upgrade-banner.component.ts
import { Component, Input, OnInit, Output, EventEmitter } from '@angular/core';
import { MatDialog } from '@angular/material/dialog';
import { ArchitecturalPlanService } from '@app/services/architectural-plan.service';
import { FileStructureMigrationDialogComponent } from './file-structure-migration-dialog.component';
@Component({
selector: 'app-legacy-upgrade-banner',
templateUrl: './legacy-upgrade-banner.component.html',
styleUrls: ['./legacy-upgrade-banner.component.scss']
})
export class LegacyUpgradeBannerComponent implements OnInit {
@Input() projectId: string = '';
@Output() upgraded = new EventEmitter<void>();
isLegacyProject: boolean = false;
showBanner: boolean = false;
checking: boolean = true;
constructor(
private planService: ArchitecturalPlanService,
private dialog: MatDialog
) {}
ngOnInit(): void {
this.checkIfLegacyProject();
}
private checkIfLegacyProject(): void {
this.checking = true;
this.planService.analyzeProjectMigration(this.projectId)
.subscribe({
next: (analysis) => {
this.isLegacyProject = analysis.needsMigration;
this.showBanner = this.isLegacyProject && !this.isDismissed();
this.checking = false;
},
error: (err) => {
console.error('Error checking project version:', err);
this.checking = false;
}
});
}
openUpgradeDialog(): void {
const dialogRef = this.dialog.open(FileStructureMigrationDialogComponent, {
width: '600px',
data: { projectId: this.projectId }
});
dialogRef.afterClosed().subscribe(result => {
if (result === 'upgraded') {
this.showBanner = false;
this.upgraded.emit();
}
});
}
dismissBanner(): void {
this.showBanner = false;
this.markAsDismissed();
}
private isDismissed(): boolean {
const key = `legacy-upgrade-dismissed-${this.projectId}`;
return localStorage.getItem(key) === 'true';
}
private markAsDismissed(): void {
const key = `legacy-upgrade-dismissed-${this.projectId}`;
localStorage.setItem(key, 'true');
}
}
Template: legacy-upgrade-banner.component.html
<mat-card *ngIf="showBanner" class="legacy-upgrade-banner" appearance="outlined">
<mat-card-content>
<div class="banner-content">
<mat-icon class="info-icon">info</mat-icon>
<div class="banner-text">
<h3>Upgrade Available</h3>
<p>
Upgrade your project to the new file structure for better organization,
rich file metadata, and improved search capabilities.
</p>
</div>
<div class="banner-actions">
<button mat-raised-button color="primary" (click)="openUpgradeDialog()">
Upgrade Project
</button>
<button mat-button (click)="dismissBanner()">Dismiss</button>
</div>
</div>
</mat-card-content>
</mat-card>
CLI Toolsβ
Bulk Upgrade Commandβ
Purpose: Admin tool for bulk upgrading legacy projects
Location: cli/codeproof.sh upgrade-file-structure
#!/bin/bash
# Bulk upgrade legacy projects to new file structure
set -e
# Configuration
GRPC_HOST="${GRPC_HOST:-localhost:8080}"
PROTO_PATH="src/main/proto"
GOOGLEAPIS_PATH="env/dependencies/googleapis"
# Parse arguments
DRY_RUN="false"
USER_ID=""
PROJECT_IDS=""
ALL="false"
while [[ $# -gt 0 ]]; do
case $1 in
--dry-run)
DRY_RUN="$2"
shift 2
;;
--user-id)
USER_ID="$2"
shift 2
;;
--project-ids)
PROJECT_IDS="$2"
shift 2
;;
--all)
ALL="true"
shift
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
# Validate inputs
if [ -z "$USER_ID" ]; then
echo "Error: --user-id is required"
exit 1
fi
echo "=========================================="
echo "File Structure Bulk Upgrade Tool"
echo "=========================================="
echo "User ID: $USER_ID"
echo "Dry Run: $DRY_RUN"
echo "All Projects: $ALL"
echo ""
# Function to migrate a single project
migrate_project() {
local project_id=$1
echo "Migrating project: $project_id"
RESPONSE=$(grpcurl -plaintext \
-import-path "${PROTO_PATH}" \
-import-path "${GOOGLEAPIS_PATH}" \
-proto "${PROTO_PATH}/api.proto" \
-d '{
"project_id": "'"${project_id}"'",
"preserve_legacy_structure": true,
"dry_run": '"${DRY_RUN}"',
"initiated_by": "'"${USER_ID}"'"
}' \
"${GRPC_HOST}" \
org.codetricks.construction.code.assistant.service.ArchitecturalPlanService/MigrateProjectFileStructure)
echo "$RESPONSE" | jq .
SUCCESS=$(echo "$RESPONSE" | jq -r '.success')
if [ "$SUCCESS" == "true" ]; then
echo "β
Successfully migrated: $project_id"
else
ERROR=$(echo "$RESPONSE" | jq -r '.error_message')
echo "β Failed to migrate $project_id: $ERROR"
fi
echo ""
}
# Get list of projects to migrate
if [ "$ALL" == "true" ]; then
echo "Fetching all projects for user..."
LIST_RESPONSE=$(grpcurl -plaintext \
-import-path "${PROTO_PATH}" \
-import-path "${GOOGLEAPIS_PATH}" \
-proto "${PROTO_PATH}/api.proto" \
-d '{"account_id": "'"${USER_ID}"'"}' \
"${GRPC_HOST}" \
org.codetricks.construction.code.assistant.service.ArchitecturalPlanService/ListArchitecturalPlanIds)
PROJECT_IDS=$(echo "$LIST_RESPONSE" | jq -r '.architectural_plan_ids[]')
fi
# Convert comma-separated to array if needed
IFS=',' read -ra PROJECTS <<< "$PROJECT_IDS"
# Migrate each project
TOTAL=${#PROJECTS[@]}
SUCCESS_COUNT=0
FAIL_COUNT=0
echo "Found $TOTAL projects to migrate"
echo ""
for project_id in "${PROJECTS[@]}"; do
migrate_project "$project_id"
# Check if successful
if [ $? -eq 0 ]; then
((SUCCESS_COUNT++))
else
((FAIL_COUNT++))
fi
done
echo "=========================================="
echo "Migration Complete"
echo "=========================================="
echo "Total Projects: $TOTAL"
echo "Successful: $SUCCESS_COUNT"
echo "Failed: $FAIL_COUNT"
echo "=========================================="
Testing Strategyβ
1. Unit Testsβ
Test Coverage:
ProjectPathResolver: Path resolution logic, caching, fallbackInputFileMetadataService: Metadata generation, persistenceFileStructureMigrationService: Migration logic, page associationDocumentClassificationService: Heuristic classification rules
Example Test (ProjectPathResolverTest.java):
@Test
public void testResolvePagePath_NewStructureExists_ReturnsNewPath() {
// Arrange
String projectId = "test-project";
int pageNumber = 1;
String expectedPath = "projects/test-project/files/file-123/pages/001/";
when(fileSystemHandler.exists(expectedPath)).thenReturn(true);
// Act
String actualPath = dualReadHandler.resolvePageFolderPath(projectId, pageNumber);
// Assert
assertEquals(expectedPath, actualPath);
verify(fileSystemHandler).exists(expectedPath);
}
@Test
public void testResolvePagePath_LegacyFallback_ReturnsLegacyPath() {
// Arrange
String projectId = "legacy-project";
int pageNumber = 1;
String newPath = "projects/legacy-project/files/.../pages/001/";
String legacyPath = "projects/legacy-project/pages/001/";
when(fileSystemHandler.exists(newPath)).thenReturn(false);
when(fileSystemHandler.exists(legacyPath)).thenReturn(true);
// Act
String actualPath = dualReadHandler.resolvePageFolderPath(projectId, pageNumber);
// Assert
assertEquals(legacyPath, actualPath);
}
2. Integration Testsβ
Test Scenarios:
- Legacy Project Read: Verify existing functionality unchanged
- Modern Project Read: Verify new structure works
- Migration End-to-End: Migrate project, verify pages accessible
- Rollback Safety: Ensure legacy structure preserved
Example Test (FileStructureMigrationIntegrationTest.java):
@Test
public void testMigrateProject_LegacyToModern_Success() {
// Arrange: Create legacy project
String projectId = createLegacyTestProject();
// Act: Migrate project
MigrationResult result = migrationService.migrateProject(
projectId, true /* preserve legacy */, false /* not dry run */);
// Assert: Migration successful
assertTrue(result.success);
assertEquals(3, result.totalPagesMigrated);
assertTrue(result.migratedFiles.size() > 0);
// Assert: Pages readable from new structure
for (int i = 1; i <= 3; i++) {
String path = dualReadHandler.resolvePageFolderPath(projectId, i);
assertTrue(path.contains("/files/"));
}
// Assert: Legacy structure still exists
String legacyPath = "projects/" + projectId + "/pages/";
assertTrue(fileSystemHandler.exists(legacyPath));
}
3. Backward Compatibility Testsβ
Critical Tests:
- Legacy project reads work unchanged
- All existing API calls return correct data
- Performance not degraded for legacy projects
- Legacy
plan-metadata.jsonstill updated
Test Matrix:
| Project Type | Read Pages | Write Pages | List Files | Get Metadata |
|---|---|---|---|---|
| Legacy | β Pass | β Pass | β Pass | β Pass |
| Transitional | β Pass | β Pass | β Pass | β Pass |
| Modern | β Pass | β Pass | β Pass | β Pass |
Refactoring: Path Resolution Consolidationβ
Problem Statementβ
Currently, ArchitecturalPlanReviewer contains 15+ path-related methods that:
- Assume legacy flat structure (
pages/{pageNum}/) - Mix concerns (domain logic + path utilities)
- Have duplicate static/instance method pairs
- No support for new hierarchical structure (
files/{fileId}/pages/) - Unclear class responsibility: Is it project-level or file-level?
Current Path Methods in ArchitecturalPlanReviewerβ
// Project-level paths
public static String getDefaultProjectHomeDir(String planId)
public String getProjectHomeDir(String planId)
public String getProjectHomeDir()
public static String getDefaultProjectsRootFolder()
public String getProjectsRootFolder()
// Legacy page paths (flat structure)
public static String getProjectPagesBasePath(String planId)
public String getPageFolderPath(int pageNumber)
public static String getPageFolderPath(String planId, int pageNumber)
// Metadata file paths
public String getPlanPageMetadataFilePath(int pageNumber)
public static String getPlanPageMetadataFilePath(String planId, int pageNumber)
public String getArchitecturalPlanMetadataFilePath()
public static String getArchitecturalPlanMetadataFilePath(String planId)
// Other paths
private String getProjectOverviewPath()
private String getFullProjectContentPath()
public String getProjectSourcePdfPath()
Issues:
- β No file_id support
- β Hardcoded legacy structure
- β Scattered across business logic class
- β Static methods don't have access to
ProjectPathResolverinstance
Semantic Clarity: What is ArchitecturalPlanReviewer?β
Current Reality:
- Name suggests "single plan" (one file)
- Implementation is project-scoped (has
planId, loads all pages in project) - Historically: 1 plan = 1 project (no ambiguity)
- Future: 1 project = N files (plans, electricals, mechanicals, inspector feedback)
Design Decision: Keep ArchitecturalPlanReviewer as project-scoped with optional file filtering.
Rationale:
- Minimal Breaking Changes: Existing code expects project-level operations
- Backward Compatible: Can operate on entire project (legacy) or single file (modern)
- Incremental Evolution: Can split into file/project classes later if needed
- Naming: "Plan" historically meant "project" in our domain
Proposed Architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Service Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ArchitecturalPlanServiceImpl β β
β β (gRPC service, orchestrates reviewers) β β
β βββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β Business Logic Layer β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ArchitecturalPlanReviewer β β
β β - Operates at PROJECT level by default β β
β β - Optional fileId filter for single-file mode β β
β β - Delegates ALL path logic to ProjectPathResolver β β
β ββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ β
β β β
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β Utility/Helper Layer β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ProjectPathResolver β β
β β - ALL path construction logic β β
β β - Supports modern & legacy structures β β
β β - Dual-read with file_id optimization β β
β β - Caching for performance β β
β ββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ β
β β β
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
β Infrastructure Layer β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FileSystemHandler (GCS/Local) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Refactoring Plan: 3 Phasesβ
Phase 1: Extract Path Logic to ProjectPathResolver β β
Goal: Centralize ALL path construction logic in ProjectPathResolver
New Methods in ProjectPathResolver:
public class ProjectPathResolver {
private final FileSystemHandler fileSystemHandler;
private final String projectsRootFolder; // "projects" by default
// Constructor
public ProjectPathResolver(FileSystemHandler fileSystemHandler) {
this(fileSystemHandler, "projects");
}
public ProjectPathResolver(FileSystemHandler fileSystemHandler, String projectsRootFolder) {
this.fileSystemHandler = fileSystemHandler;
this.projectsRootFolder = projectsRootFolder;
}
// ========================================
// Project-Level Paths
// ========================================
/**
* Returns the projects root folder path.
* @return "projects" by default
*/
public String getProjectsRootFolder() {
return projectsRootFolder;
}
/**
* Returns the project home directory path.
* @param projectId The project ID
* @return "projects/{projectId}"
*/
public String getProjectHomeDir(String projectId) {
return projectsRootFolder + "/" + projectId;
}
/**
* Returns the project inputs folder path.
* @param projectId The project ID
* @return "projects/{projectId}/inputs"
*/
public String getProjectInputsPath(String projectId) {
return getProjectHomeDir(projectId) + "/inputs";
}
/**
* Returns the plan metadata file path (plan-metadata.json).
* @param projectId The project ID
* @return "projects/{projectId}/plan-metadata.json"
*/
public String getPlanMetadataFilePath(String projectId) {
return getProjectHomeDir(projectId) + "/plan-metadata.json";
}
/**
* Returns the project metadata file path (project-metadata.json).
* @param projectId The project ID
* @return "projects/{projectId}/project-metadata.json"
*/
public String getProjectMetadataFilePath(String projectId) {
return getProjectHomeDir(projectId) + "/project-metadata.json";
}
/**
* Returns the project overview file path.
* @param projectId The project ID
* @return "projects/{projectId}/overview.md"
*/
public String getProjectOverviewPath(String projectId) {
return getProjectHomeDir(projectId) + "/overview.md";
}
/**
* Returns the full project content file path.
* @param projectId The project ID
* @return "projects/{projectId}/project-content.md"
*/
public String getFullProjectContentPath(String projectId) {
return getProjectHomeDir(projectId) + "/project-content.md";
}
// ========================================
// File-Level Paths (Modern Structure)
// ========================================
/**
* Returns the files folder path.
* @param projectId The project ID
* @return "projects/{projectId}/files"
*/
public String getFilesBasePath(String projectId) {
return getProjectHomeDir(projectId) + "/files";
}
/**
* Returns the file index path.
* @param projectId The project ID
* @return "projects/{projectId}/files/index.json"
*/
public String getFileIndexPath(String projectId) {
return getFilesBasePath(projectId) + "/index.json";
}
/**
* Returns the file folder path.
* @param projectId The project ID
* @param fileId The file ID (e.g., "1", "2", "3")
* @return "projects/{projectId}/files/{fileId}"
*/
public String getFileFolderPath(String projectId, String fileId) {
return getFilesBasePath(projectId) + "/" + fileId;
}
/**
* Returns the file metadata path.
* @param projectId The project ID
* @param fileId The file ID
* @return "projects/{projectId}/files/{fileId}/metadata.json"
*/
public String getFileMetadataPath(String projectId, String fileId) {
return getFileFolderPath(projectId, fileId) + "/metadata.json";
}
/**
* Returns the file pages folder path.
* @param projectId The project ID
* @param fileId The file ID
* @return "projects/{projectId}/files/{fileId}/pages"
*/
public String getFilePagesBasePath(String projectId, String fileId) {
return getFileFolderPath(projectId, fileId) + "/pages";
}
// ========================================
// Page-Level Paths (Dual-Read Support)
// ========================================
/**
* Returns the legacy pages folder path.
* @param projectId The project ID
* @return "projects/{projectId}/pages"
*/
public String getLegacyPagesBasePath(String projectId) {
return getProjectHomeDir(projectId) + "/pages";
}
/**
* Returns the page folder path with optional file_id.
*
* <p><b>Path Resolution Strategy:</b>
* <ul>
* <li>If fileId provided: Direct modern path (FAST)</li>
* <li>If fileId null: Dual-read logic (cache β modern β legacy)</li>
* </ul>
*
* @param projectId The project ID
* @param pageNumber The page number (1-based)
* @param fileId Optional file ID for direct access (null for auto-detect)
* @return Resolved page folder path
* @throws PageNotFoundException if page doesn't exist in either structure
*/
public String resolvePageFolderPath(String projectId, int pageNumber, String fileId)
throws PageNotFoundException {
// Implementation already covered earlier in this TDD
// ... (see lines 699-745)
}
/**
* Returns the page metadata file path.
* @param projectId The project ID
* @param pageNumber The page number
* @param fileId Optional file ID
* @return "projects/{projectId}/files/{fileId}/pages/{pageNum}/metadata.json"
* or "projects/{projectId}/pages/{pageNum}/metadata.json" (legacy)
*/
public String getPageMetadataPath(String projectId, int pageNumber, String fileId)
throws PageNotFoundException {
String pageFolderPath = resolvePageFolderPath(projectId, pageNumber, fileId);
return pageFolderPath + "/metadata.json";
}
/**
* Returns the page PDF file path.
* @param projectId The project ID
* @param pageNumber The page number
* @param fileId Optional file ID
* @return Path to page.pdf
*/
public String getPagePdfPath(String projectId, int pageNumber, String fileId)
throws PageNotFoundException {
String pageFolderPath = resolvePageFolderPath(projectId, pageNumber, fileId);
return pageFolderPath + "/page.pdf";
}
/**
* Returns the page markdown file path.
* @param projectId The project ID
* @param pageNumber The page number
* @param fileId Optional file ID
* @return Path to page.md
*/
public String getPageMarkdownPath(String projectId, int pageNumber, String fileId)
throws PageNotFoundException {
String pageFolderPath = resolvePageFolderPath(projectId, pageNumber, fileId);
return pageFolderPath + "/page.md";
}
// ========================================
// Utility Methods
// ========================================
/**
* Checks if project uses modern file structure.
* @param projectId The project ID
* @return true if files/ directory exists
*/
public boolean isModernStructure(String projectId) throws IOException {
return fileSystemHandler.exists(getFilesBasePath(projectId));
}
/**
* Checks if project uses legacy structure.
* @param projectId The project ID
* @return true if pages/ directory exists but files/ doesn't
*/
public boolean isLegacyStructure(String projectId) throws IOException {
String filesPath = getFilesBasePath(projectId);
String pagesPath = getLegacyPagesBasePath(projectId);
return fileSystemHandler.exists(pagesPath) && !fileSystemHandler.exists(filesPath);
}
/**
* Atomically increments and returns the next file ID for a project.
* Thread-safe for concurrent file uploads using optimistic locking (CAS).
*
* @param projectId The project ID
* @return The assigned file ID (guaranteed unique within project)
* @throws IOException if max retries exceeded or I/O error
*/
public int getAndIncrementFileId(String projectId) throws IOException {
String indexPath = getFileIndexPath(projectId);
int maxRetries = 10;
for (int attempt = 0; attempt < maxRetries; attempt++) {
try {
// Read current index with version (GCS generation number)
Long currentVersion = fileSystemHandler.getFileVersion(indexPath);
JSONObject index;
if (currentVersion == null) {
// File doesn't exist - initialize new index
index = new JSONObject();
index.put("next_file_id", 1);
index.put("files", new JSONArray());
} else {
// File exists - read and parse
String content = fileSystemHandler.readFile(indexPath);
index = new JSONObject(content);
}
// Get current ID and increment for next time
int assignedId = index.optInt("next_file_id", 1);
index.put("next_file_id", assignedId + 1);
// Atomic write: Only succeeds if version matches
try {
fileSystemHandler.writeFileAtomic(indexPath, index.toString(2), currentVersion);
return assignedId;
} catch (AtomicWriteConflictException e) {
// Another thread modified - retry with exponential backoff
Thread.sleep(50 + (attempt * 10));
continue;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException("Interrupted while assigning file ID", e);
}
}
throw new IOException("Failed to assign file ID after " + maxRetries + " retries");
}
}
FileSystemHandler Atomic Operationsβ
New Abstract Methods for thread-safe file ID generation:
/**
* Atomically writes a file if the expected version/generation matches.
* Provides compare-and-set (CAS) semantics for concurrent-safe updates.
*
* @param path The file path
* @param content The content to write
* @param expectedVersion The expected version/generation (null for "must not exist")
* @return The new version/generation after write
* @throws AtomicWriteConflictException if version doesn't match
* @throws IOException if there's an I/O error
*/
public abstract long writeFileAtomic(String path, String content, Long expectedVersion)
throws IOException, AtomicWriteConflictException;
/**
* Gets the current version/generation of a file.
*
* @param path The file path
* @return The current version/generation, or null if file doesn't exist
* @throws IOException if there's an error accessing the file
*/
public abstract Long getFileVersion(String path) throws IOException;
GcsFileSystemHandler Implementation:
- Uses native GCS generation numbers
BlobTargetOption.generationMatch(expectedVersion)for CASBlobTargetOption.doesNotExist()for new files- Returns HTTP 412 on conflicts β
AtomicWriteConflictException
LocalFileSystemHandler Implementation:
- Uses last modified time as version (milliseconds)
- Synchronized locks per file path (single-instance only)
- Note: Doesn't scale horizontally (use GCS in production)
AtomicWriteConflictException:
- Custom exception for CAS failures
- Contains: path, expectedVersion, actualVersion
- Signals retry needed in
getAndIncrementFileId()
Phase 2: Update ArchitecturalPlanReviewerβ
Goal: Delegate all path logic to ProjectPathResolver, add optional fileId support
Changes to ArchitecturalPlanReviewer:
public class ArchitecturalPlanReviewer {
private final String planId; // Actually projectId
private final String projectsRootFolder;
private final FileSystemHandler fileSystemHandler;
// NEW: ProjectPathResolver instance
private final ProjectPathResolver pathResolver;
// NEW: Optional file ID for single-file mode
private final String fileId; // null for project-wide mode
// Constructor with optional fileId
public ArchitecturalPlanReviewer(
String planId,
FileSystemHandler fileSystemHandler,
String projectSourcePdfPath,
ModelClient modelClient,
List<Integer> pageList,
boolean forceReprocess,
boolean enableOrientationDetection,
List<String> iccDocumentIds,
ProgressCallback progressCallback,
String projectsRootFolder,
String fileId) throws IOException { // NEW parameter
this.planId = planId;
this.projectsRootFolder = projectsRootFolder;
this.fileSystemHandler = fileSystemHandler;
this.fileId = fileId; // NEW field
// NEW: Initialize ProjectPathResolver
this.pathResolver = new ProjectPathResolver(fileSystemHandler, projectsRootFolder);
// ... rest of initialization
}
// ========================================
// Updated Path Methods (delegate to ProjectPathResolver)
// ========================================
public String getProjectHomeDir() {
return pathResolver.getProjectHomeDir(planId);
}
public static String getDefaultProjectHomeDir(String planId) {
// Backward compatibility: use default root folder
ProjectPathResolver resolver = new ProjectPathResolver(
FileSystemHandlerFactory.createDefaultFileSystemHandler());
return resolver.getProjectHomeDir(planId);
}
public String getPageFolderPath(int pageNumber) throws PageNotFoundException {
return pathResolver.resolvePageFolderPath(planId, pageNumber, fileId);
}
public static String getPageFolderPath(String planId, int pageNumber) {
ProjectPathResolver resolver = new ProjectPathResolver(
FileSystemHandlerFactory.createDefaultFileSystemHandler());
try {
return resolver.resolvePageFolderPath(planId, pageNumber, null);
} catch (PageNotFoundException e) {
throw new RuntimeException(e);
}
}
public String getPlanPageMetadataFilePath(int pageNumber) throws PageNotFoundException {
return pathResolver.getPageMetadataPath(planId, pageNumber, fileId);
}
private String getArchitecturalPlanMetadataFilePath() {
return pathResolver.getPlanMetadataFilePath(planId);
}
private String getProjectOverviewPath() {
return pathResolver.getProjectOverviewPath(planId);
}
private String getFullProjectContentPath() {
return pathResolver.getFullProjectContentPath(planId);
}
// ... all other path methods delegate to pathResolver
// ========================================
// Optional: Convenience Getters
// ========================================
public String getFileId() {
return fileId;
}
public boolean isSingleFileMode() {
return fileId != null && !fileId.isEmpty();
}
public ProjectPathResolver getPathResolver() {
return pathResolver;
}
}
Backward Compatibility:
- All existing constructors remain unchanged
- New
fileIdparameter added to most flexible constructor only - Default behavior (fileId=null) maintains current project-wide operation
- Static methods continue to work via temporary ProjectPathResolver instances
Phase 3: Service Layer Updatesβ
Goal: Update gRPC service implementations to extract and pass file_id to ArchitecturalPlanReviewer
Key Services to Update:
ArchitecturalPlanServiceImpl(main facade)ArchitecturalPlanReviewServiceImpl(compliance analysis)ArchitecturalPlanAnalysisServiceImpl(analysis availability)ComplianceReportAsyncServiceImpl(async tasks)
Example: ArchitecturalPlanServiceImpl:
public class ArchitecturalPlanServiceImpl {
@Override
public PageApplicabilityAnalysisList getApplicableCodeSections(
GetApplicableCodeSectionsRequest request) {
String projectId = request.getArchitecturalPlanId();
int pageNumber = request.getPageNumber();
String fileId = request.getFileId(); // From updated proto
// Create reviewer with optional fileId
// If fileId provided, reviewer operates in single-file mode
// If fileId null, reviewer operates in project-wide mode (legacy)
ArchitecturalPlanReviewer reviewer = createReviewer(projectId, fileId);
// Reviewer automatically uses fileId for path resolution
// ...
}
private ArchitecturalPlanReviewer createReviewer(String projectId, String fileId) {
// Use constructor with fileId parameter
// ...
}
}
Phase 4: Frontend/UI Updatesβ
Goal: Update Angular frontend to track file_id and pass it in RPC requests
Key Components to Update:
- API Service (
api.service.ts) - Addfile_idparameter to RPC methods - Compliance Component - Look up
file_idfromInputFileMetadata - Page Navigation Component - Track file-to-page mappings
- File Metadata Service - Fetch and cache
InputFileMetadatalist
Implementation Pattern:
// 1. Fetch InputFileMetadata for the project
this.fileMetadataService.listInputFiles(projectId).subscribe(files => {
this.inputFiles = files;
});
// 2. Look up file_id for a given page number
private getFileIdForPage(pageNumber: number): string | undefined {
const fileMetadata = this.inputFiles.find(
f => f.extracted_pages.includes(String(pageNumber))
);
return fileMetadata?.file_id;
}
// 3. Pass file_id when making RPC calls
loadPageAnalysis(pageNumber: number) {
const fileId = this.getFileIdForPage(pageNumber);
this.apiService.getApplicableCodeSections(
this.projectId,
pageNumber,
this.iccBookId,
fileId // Pass file_id (undefined for legacy projects)
).subscribe(/* ... */);
}
Migration Path (Refactoring Phases)β
Phase 1: Backend Infrastructure (Week 1-2) β COMPLETE
- β
Implement
ProjectPathResolverwith all path methods - β
Add optional
file_idto RPC proto definitions - β Write comprehensive unit tests (40 tests)
- β
Implement atomic
getAndIncrementFileId()with GCS CAS - β
Add
FileSystemHandleratomic operations
Phase 2: ArchitecturalPlanReviewer Refactoring (Week 3) β COMPLETE
- β
Update
ArchitecturalPlanReviewerto useProjectPathResolver - β
Add optional
fileIdparameter to constructor - β
Delegate all 8 path methods to
pathResolver - β Add static path builders (no FileSystemHandler overhead)
- β Maintain backward compatibility (all existing tests pass)
Phase 3: Service Layer Updates (Week 3-4) π NEXT
- Update
ArchitecturalPlanServiceImplto passfileId - Update
ArchitecturalPlanReviewServiceImplto passfileId - Update
ArchitecturalPlanAnalysisServiceImplto passfileId - Update
ComplianceReportAsyncServiceImplto passfileId - Extract
fileIdfrom RPC requests and pass to reviewer constructor - Integration testing
Phase 4: Frontend/UI Updates (Week 4-5)
- Update
api.service.ts- Addfile_idparameter to RPC methods - Update compliance component - Look up
file_idfromInputFileMetadata - Create file metadata service - Fetch and cache
InputFileMetadatalist - Update page navigation - Display hierarchical file tree
- UI testing and polish
Benefitsβ
- Single Source of Truth: All path logic in one place
- DRY Principle: No duplication between static/instance methods
- Testability: Easy to mock
ProjectPathResolver - Flexibility: Supports modern, legacy, and hybrid structures
- Performance: Caching and optimization in one place
- Backward Compatible: Existing code continues to work
- Future-Proof: Easy to add new path types (e.g.,
reports/{fileId}/)
Testing Strategyβ
Unit Tests for ProjectPathResolver:
@Test
public void testResolvePagePath_WithFileId_DirectPath() {
ProjectPathResolver resolver = new ProjectPathResolver(mockFileSystemHandler);
String path = resolver.resolvePageFolderPath("project-1", 5, "2");
assertEquals("projects/project-1/files/2/pages/005", path);
// Should NOT call fileSystemHandler (no filesystem checks)
}
@Test
public void testResolvePagePath_WithoutFileId_ModernStructure() throws Exception {
when(mockFileSystemHandler.exists("projects/project-1/files/")).thenReturn(true);
when(mockFileSystemHandler.listDirectories("projects/project-1/files/"))
.thenReturn(Arrays.asList("1", "2", "3"));
when(mockFileSystemHandler.exists("projects/project-1/files/2/pages/005/"))
.thenReturn(true);
String path = resolver.resolvePageFolderPath("project-1", 5, null);
assertEquals("projects/project-1/files/2/pages/005", path);
}
@Test
public void testResolvePagePath_WithoutFileId_LegacyFallback() throws Exception {
when(mockFileSystemHandler.exists("projects/project-1/files/")).thenReturn(false);
when(mockFileSystemHandler.exists("projects/project-1/pages/005/")).thenReturn(true);
String path = resolver.resolvePageFolderPath("project-1", 5, null);
assertEquals("projects/project-1/pages/005", path);
}
Open Questions for Discussionβ
-
Naming: Should we rename
planIdβprojectIdthroughout the codebase for clarity?- Recommendation: Yes, but as separate refactoring (Issue #XXX)
-
Static Methods: Keep static methods in
ArchitecturalPlanReviewerfor backward compatibility?- Recommendation: Yes, but mark as
@Deprecatedafter Phase 2
- Recommendation: Yes, but mark as
-
File Index Performance: Should
ProjectPathResolvercachefiles/index.jsonin memory?- Recommendation: Yes, with TTL of 5 minutes (balances freshness vs performance)
-
Future Split: Should we eventually split
ArchitecturalPlanReviewerinto file/project classes?- Recommendation: Monitor usage patterns; split only if clear need emerges
Deployment Strategyβ
Phase 1: Backend Infrastructure (Week 1)β
Deliverables:
ProjectPathResolverwith fallback logicInputFileMetadataServicebasic implementation- Unit tests passing
- Feature flag:
enable_dual_read_filesystem(default:true)
Deployment: Deploy to dev, run integration tests, promote to staging
Risk: Low (read-only, backward compatible)
Phase 2: Migration Service (Week 2)β
Deliverables:
FileStructureMigrationServicewith dry-run support- CLI tool for bulk upgrades
- Integration tests with real legacy projects
- Feature flag:
enable_file_structure_migration(default:false)
Deployment: Deploy to dev, test migration on cloned projects
Risk: Medium (write operations, but preserves legacy structure)
Phase 3: Frontend Integration (Week 3)β
Deliverables:
FileMetadataListComponentshowing file list (project settings)PageTocHierarchicalComponentfor hierarchical navigation (TOC sidebar)LegacyUpgradeBannerComponentprompting users- User-initiated migration workflow
- E2E tests in Cypress
Deployment: Deploy to dev, user acceptance testing
Risk: Low (UI only, backend already deployed)
Phase 4: Production Rollout (Week 4)β
Deliverables:
- Enable feature flags in production
- Monitor error rates and performance
- Gradual rollout: 10% β 50% β 100% of users
- Rollback plan prepared
Deployment: Canary deployment, monitor metrics
Risk: Low (extensive testing, rollback available)
Monitoring and Observabilityβ
Key Metricsβ
-
Read Performance:
page_read_latency_ms(p50, p95, p99)path_cache_hit_rate(target: > 80%)legacy_fallback_rate(should decrease over time)
-
Migration Success:
migrations_total(count)migrations_successful(count)migrations_failed(count)migration_duration_seconds(histogram)
-
File Metadata:
files_with_metadata_percent(target: 100%)classification_accuracy(manual validation)
Alertsβ
-
Critical:
legacy_fallback_rate > 50%(indicates new structure not working)page_read_latency_p99 > 2000ms(performance regression)migrations_failed / migrations_total > 0.05(5% failure rate)
-
Warning:
path_cache_hit_rate < 60%(cache ineffective)files_without_metadata > 10(metadata generation failing)
Rollback Planβ
Immediate Rollback (< 1 hour)β
Scenario: Critical bug detected in production
Steps:
- Disable feature flag:
enable_dual_read_filesystem = false - Revert to previous deployment
- All reads go directly to legacy
pages/structure - No data loss (legacy structure preserved)
Partial Rollback (Specific Projects)β
Scenario: Migration failed for specific projects
Steps:
- Identify affected projects
- Delete
files/folder for those projects - Pages automatically fall back to legacy
pages/structure - No functionality lost
Data Recoveryβ
Scenario: Accidental data loss (unlikely due to preservation)
Steps:
- Legacy
pages/folder is never deleted (configured viapreserve_legacy_structure = true) - Restore from Cloud Storage versioning if needed
- Re-run migration with fixed logic
Performance Considerationsβ
Path Cachingβ
- In-memory cache with 1-hour TTL
- Reduces filesystem checks by 80%+
- Cache invalidation on migration
Lazy Metadata Loadingβ
- Metadata loaded on-demand, not preemptively
- List operations return minimal metadata
- Full metadata fetched when needed
Parallel Migrationβ
- Multiple projects can be migrated concurrently
- Pages within a project migrated sequentially (safer)
- Configurable concurrency limit
Security Considerationsβ
RBAC Integrationβ
- Migration requires
OWNERpermissions - File metadata respects project-level permissions
- Admin bulk upgrades logged for audit trail
Data Integrityβ
- Checksums verified during migration
- Transactional migrations (all-or-nothing where possible)
- Legacy structure preserved for rollback
β Implementation Status (October 2025)β
COMPLETED FEATURESβ
All core functionality has been successfully implemented and is working in production:
Backend Infrastructure β β
InputFileMetadataService: Complete metadata generation and managementProjectPathResolver: Intelligent dual-read with caching (modern β legacy fallback)- Atomic File Operations: GCS generation-based Compare-and-Set for race condition prevention
- File-Aware gRPC API: Enhanced
GetArchitecturalPlanPageRequestwithfile_idparameter - Thread-Safe Metadata Updates: Retry logic with exponential backoff for concurrent operations
- Comprehensive Logging: Detailed debugging information throughout the system
Frontend Integration β β
- Hierarchical Table of Contents: Expandable file containers with nested pages
- File-Aware Navigation: URLs include file ID (
/files/{file_id}/pages/{page_number}/{tab}) - Enhanced File Headers: Two-line layout with document type, visual emphasis, and proper spacing
- File-Aware Page Selection: Correct highlighting and content loading per file
- Page Overlap Detection: Scoped to individual files (not project-wide)
- Automatic UI Refresh: Updates after background ingestion task completion
- Intelligent Caching: Prevents unnecessary data reloads and race conditions
Multi-File Support β β
- File-Aware PDF Loading: Backend correctly serves PDFs from specific files
- Concurrent Ingestion: Multiple files can be processed simultaneously without conflicts
- File-Specific Operations: Page ingestion, overlap detection, and metadata updates per file
- Backward Compatibility: Legacy single-file projects continue to work seamlessly
KEY ARCHITECTURAL DECISIONS MADEβ
- File ID Strategy: Auto-incrementing integers (1, 2, 3...) for readable URLs
- Path Resolution: Modern structure first, legacy fallback with caching
- Metadata Updates: Atomic operations using GCS object generations
- UI Pattern: Angular Material expansion panels for hierarchical navigation
- URL Structure: File-aware routes with backward compatibility redirects
- Caching Strategy: Path-based caching with file-aware cache keys
PRODUCTION DEPLOYMENT STATUSβ
- β Backend Services: Deployed and operational
- β Frontend UI: Hierarchical navigation working
- β gRPC API: File-aware endpoints functional
- β Database Schema: Metadata structure implemented
- β Migration Support: Dual-read compatibility active
Future Enhancementsβ
- AI Document Classification: Use LLM to classify document types with higher accuracy
- Content-Based Summarization: Generate AI summaries of file contents
- Automatic Metadata Refresh: Periodically update metadata for stale files
- Advanced Search: Full-text search across file metadata and content
- File Versioning: Track changes to input files over time
- Multi-File Coordination: Batch upload with relationship tracking
- Custom Metadata Fields: User-defined tags and labels
- Analytics Dashboard: Visualize file types, processing times, storage usage
File Index Structureβ
Purposeβ
The files/index.json file serves a single purpose:
- File ID Generation: Maintains auto-increment counter for new files
Schemaβ
Location: projects/{projectId}/files/index.json
{
"next_file_id": 4,
"files": [
{
"file_id": "1",
"file_name": "architectural-plans.pdf"
},
{
"file_id": "2",
"file_name": "electrical-plans.pdf"
},
{
"file_id": "3",
"file_name": "structural-plans.pdf"
}
]
}
Why No page_to_file_map?
Initially considered mapping page numbers to file IDs, but this is fundamentally flawed for multi-file projects:
- β Ambiguous: Page "1" exists in multiple files (architectural, electrical, structural)
- β Not scalable: Can't map file-scoped page numbers to files
- β
Solution: Frontend/API must always pass both
file_idANDpage_number
Page Number Semantics:
- Modern projects: Page numbers are file-scoped (each file has pages 1, 2, 3...)
- Legacy projects: Page numbers are project-global (single file, sequential)
- Migration: Legacy global pages β Modern file-scoped pages (e.g., page 46 β file 2, page 1)
Benefits:
- β Simple, unambiguous schema
- β Single source of truth for next file ID
- β Small file size (few KB even with hundreds of files)
- β No page number collisions
Update Strategy:
- Updated when new files are uploaded
- Updated when files are deleted
- Read-only for page lookups (frontend tracks file_id separately)
Open Questionsβ
-
Q: How do we track which pages belong to which file?
A: UseInputFileMetadata.extracted_pagesfield (stored infiles/{file_id}/metadata.json) -
Q: What if users upload duplicate files?
A: Detect duplicates using MD5 checksum, prompt user to replace or keep both -
Q: How to handle pages that don't belong to any input file?
A: Create a "miscellaneous" file entry with IDunknown-source -
Q: Should migration be reversible (downgrade from modern to legacy)?
A: Not initially - legacy structure is preserved, so just deletefiles/to "downgrade"
Success Criteriaβ
Phase 1 (Infrastructure):
- β All legacy projects continue to work unchanged
- β No performance regression (< 5% latency increase)
- β 100% backward compatibility test coverage
Phase 2 (Migration):
- β > 95% migration success rate
- β Zero data loss incidents
- β Legacy structure preserved in all cases
Phase 3 (Frontend):
- β File metadata visible in UI (project settings page)
- β Hierarchical page navigation implemented (TOC sidebar)
- β User-initiated upgrades working
- β Positive user feedback on new features
Phase 4 (Adoption):
- β > 50% of active projects upgraded within 3 months
- β File metadata used in search/filter features
- β Reduced support tickets about file organization
Referencesβ
- PRD: File Structure Reorganization
- Issue #167
- Issue #227: Project Metadata Management
- Protocol Buffer Definitions:
src/main/proto/api.proto - ArchitecturalPlanReviewer:
src/main/java/org/codetricks/construction/code/assistant/ArchitecturalPlanReviewer.java - Developer Playbook