File Structure Reorganization

📋 Implementation Issue: Issue #167 - Reorganize Project Structure: Move from pages/ to files/{file_id}/pages/ with Rich Metadata

Executive Summary

This PRD defines the requirements for reorganizing project file structure from a flat pages/ directory to a hierarchical files/{file_id}/pages/ structure with rich file metadata. This change improves file organization, enables better tracking of which pages came from which source documents, and provides rich metadata about each input file.

Key Principle: Backward compatibility is paramount. Legacy projects must continue to function without disruption, with clear upgrade paths for users who want to adopt the new structure.

Problem Statement

Current State

Current Project Structure:

projects/{projectId}/
├── project-metadata.json      # Project-level metadata
├── plan-metadata.json          # Flat list of all pages
├── pages/                      # ALL pages mixed together (no source file tracking)
│   ├── 001/
│   │   ├── page.pdf
│   │   ├── page.md
│   │   ├── page-summary-1000char.json
│   │   └── ...
│   ├── 002/
│   └── ...
├── inputs/                     # Raw uploaded files
│   ├── architectural-plans.pdf
│   ├── electrical-plans.pdf
│   └── ...
├── review/                     # Compliance review artifacts
└── overview.md

Problems with Current Structure

Loss of Source File Context: Once pages are extracted from multiple PDFs, there's no way to determine which pages came from which source file
No File-Level Metadata: Cannot track document type (architectural vs electrical), processing status, or page count per file
Poor Organization: All pages from all files are mixed together in a flat structure
Difficult File Management: Cannot easily delete, reprocess, or update individual source files
No File Classification: Cannot distinguish between architectural plans, electrical plans, inspector feedback, etc.
Limited Search/Filter: Cannot filter pages by source file or document type
Scalability Issues: As projects grow with more input files, the flat structure becomes unwieldy

User Impact

Architects: Cannot easily identify which pages came from which discipline (architectural, structural, MEP)
Project Managers: Cannot track processing status per file or identify which files need attention
Reviewers: Cannot focus review on specific file types (e.g., only architectural plans)
System Administrators: Cannot efficiently troubleshoot file processing issues without source file context

Relationship to Issue #227 (Project Metadata)

Orthogonal Concerns - Both features are needed and complement each other:

Feature	Purpose	Location	Scope
Issue #227 (Project Metadata)	Project-level information (name, description, address, building codes)	`projects/{projectId}/project-metadata.json`	Entire project
Issue #167 (File Metadata - THIS PRD)	File-level information (document type, pages, processing status)	`projects/{projectId}/files/{file_id}/metadata.json`	Individual input file

Visual Relationship:

projects/{projectId}/
├── project-metadata.json         ← Issue #227: WHO/WHAT/WHERE is this project?
├── files/                         ← Issue #167: WHICH input files, document types, pages?
│   ├── {file_id_1}/
│   │   ├── metadata.json          ← Issue #167: This file's metadata
│   │   └── pages/                 ← Issue #167: Pages from this file
│   └── {file_id_2}/
│       ├── metadata.json
│       └── pages/
└── inputs/                        ← Raw files

Implementation Order: Issue #227 should be implemented first (simpler, immediate value), followed by Issue #167 (more complex, requires migration).

Proposed Solution

New Project Structure

projects/{projectId}/
├── project-metadata.json          # Project-level metadata (Issue #227)
├── plan-metadata.json              # LEGACY - Deprecated, for backward compatibility only
├── files/                          # NEW - Processed input files with rich metadata
│   ├── index.json                  # File ID counter + page-to-file mapping (NEW)
│   ├── 1/                          # Auto-increment file IDs for readable URLs
│   │   ├── metadata.json           # Rich file metadata (NEW)
│   │   └── pages/                  # Pages extracted from this specific file
│   │       ├── 001/
│   │       │   ├── page.pdf
│   │       │   ├── page.md
│   │       │   ├── page-summary-1000char.json
│   │       │   └── ...
│   │       ├── 002/
│   │       └── ...
│   ├── 2/                          # Second file uploaded
│   │   ├── metadata.json
│   │   └── pages/
│   │       ├── 001/
│   │       ├── 002/
│   │       └── ...
│   └── ...
├── pages/                          # LEGACY - Preserved for backward compatibility
│   └── [existing pages unchanged]
├── inputs/                         # Raw input files uploaded into the project
│   ├── architectural-plans.pdf
│   ├── electrical-plans.pdf
│   └── ...
├── review/                         # Compliance review artifacts
├── overview.md
└── project-content.md

File Metadata Schema

The files/{file_id}/metadata.json file contains the InputFileMetadata proto message:

import "google/protobuf/timestamp.proto";

message InputFileMetadata {
  // Basic file information
  string file_id = 1;                        // Unique auto-increment ID (e.g., "1", "2", "3")
  string file_name = 2;                      // Original filename
  string file_path = 3;                      // Path relative to inputs/
  string mime_type = 4;                      // MIME type (e.g., "application/pdf")
  int64 file_size_bytes = 5;                 // File size in bytes
  google.protobuf.Timestamp upload_date = 6; // When file was uploaded
  
  // Document classification
  DocumentType document_type = 7;            // Classified document type
  int32 page_count = 8;                      // Number of pages (for PDFs)
  
  // Processing metadata
  ProcessingStatus processing_status = 9;        // Current processing state
  google.protobuf.Timestamp processed_date = 10; // When processing completed
  repeated string extracted_pages = 11;          // List of extracted page IDs
  
  // Content insights
  string content_summary = 12;               // AI-generated summary
  
  // Technical metadata
  string checksum_md5 = 13;                  // File integrity check
}

Note: Proto definitions already exist in api.proto (lines 225-274) - no new proto messages needed!

Enum Naming Note: The existing enums use prefixed values (e.g., DOCUMENT_TYPE_ARCHITECTURAL_PLAN) which is not aligned with our Protocol Buffers Best Practices (should be ARCHITECTURAL_PLAN in a dedicated package). This is acceptable for now since the enums already exist in production. Future refactoring could move these to src/main/proto/file_metadata.proto with clean enum values, but that's outside the scope of this issue.

Implementation Phases

Phase 1: Infrastructure & Dual-Read Support (Backward Compatible) ✅ COMPLETED

Goal: Add new file structure alongside legacy structure without breaking existing projects

Deliverables:

✅ New file storage handlers supporting files/{file_id}/ structure
✅ Dual-read logic: Try new structure first, fall back to legacy pages/ if not found
✅ File ID generation and management utilities
✅ GenerateInputFileMetadata RPC implementation
✅ Legacy project detection logic
✅ Comprehensive backward compatibility tests
✅ ProjectPathResolver with intelligent path resolution and caching
✅ Atomic file operations with GCS generation-based Compare-and-Set (CAS)
✅ Race condition prevention for concurrent metadata updates

Success Criteria: ✅ ALL MET

All existing functionality works unchanged
New projects can use new structure
Legacy projects continue to read from pages/ without errors
Zero downtime during deployment

Phase 2: Frontend Integration & User Migration Tools ✅ COMPLETED

Goal: Enable users to see file metadata and upgrade legacy projects

Deliverables:

✅ Enhanced project settings page showing rich file metadata
✅ File list with document type, page count, processing status
✅ Legacy project detection banner in UI
✅ User-initiated upgrade workflow (manual migration)
✅ File-level operations (view pages by file, reprocess file)
✅ CLI tool for admin bulk upgrades
✅ Hierarchical table of contents with expandable file containers
✅ File-aware page selection and highlighting
✅ File-aware URL structure: /files/{file_id}/pages/{page_number}/{tab}
✅ Page overlap detection scoped to individual files (not project-wide)

Success Criteria: ✅ ALL MET

Users can see which files are in their projects
Users can view metadata per file
Clear upgrade path with user control
No forced migrations - users opt-in

Phase 3: New File Processing Pipeline ✅ COMPLETED

Goal: Process new uploads directly into new structure

Deliverables:

✅ Update IngestArchitecturalPlan to create file metadata
✅ Process pages directly into files/{file_id}/pages/
✅ Update plan-metadata.json for backward compatibility
✅ Automatic metadata generation on upload
✅ Document type classification (LLM-based)
✅ Thread-safe metadata updates with atomic Compare-and-Set operations
✅ Comprehensive logging for debugging metadata operations
✅ File-aware gRPC API with file_id parameter in GetArchitecturalPlanPageRequest

Success Criteria: ✅ ALL MET

New uploads go directly to new structure
Legacy projects continue to work
Metadata automatically generated
No manual intervention required for new files

Phase 4: Migration & Deprecation (Optional Future)

Goal: Gradually migrate legacy projects and deprecate old structure

Deliverables:

✅ Automated migration scheduler
✅ Migration progress tracking
✅ Rollback capabilities
✅ Deprecation warnings in UI
✅ Final migration tool

Success Criteria:

All projects migrated to new structure
Legacy pages/ can be safely removed
plan-metadata.json deprecated

Timeline: Phase 4 is optional and low-priority. Legacy support can remain indefinitely.

✨ Additional Features Implemented

Beyond the original PRD scope, the following enhancements were implemented during development:

Multi-File UI Enhancements ✅ COMPLETED

Hierarchical Drawer Navigation: Table of contents displays files as expandable containers with nested pages
File-Aware Page Selection: Only the specific page from the specific file gets highlighted (prevents cross-file highlighting)
File-Aware URLs: URLs include file ID for proper page identification (/files/{file_id}/pages/{page_number}/{tab})
Enhanced File Headers: Two-line layout with document type, file ID, and visual emphasis (borders, shadows, background colors)
Fade-out Text Truncation: Long filenames fade to transparency instead of using ellipsis
Responsive Spacing: Optimized padding and alignment for maximum real estate usage

Backend Robustness ✅ COMPLETED

Race Condition Prevention: Atomic metadata updates using GCS object generations
Retry Logic: Automatic retry with exponential backoff for concurrent modification conflicts
Thread-Safe Operations: Multiple ingestion tasks can run simultaneously without data corruption
Enhanced Error Handling: Comprehensive logging and error recovery mechanisms
File-Aware PDF Loading: Backend correctly identifies and serves PDFs from specific files

Developer Experience ✅ COMPLETED

Comprehensive Debugging: Detailed logging throughout the system for troubleshooting
Proto API Updates: Enhanced gRPC API with file-aware parameters
Frontend Caching: Intelligent caching to prevent unnecessary data reloads
Automatic Refresh: UI automatically updates after background ingestion tasks complete

Backward Compatibility Strategy

Core Principle: Dual-Read, Selective Write

Read Operations (Backward Compatible):

Try new structure first: Check files/{file_id}/pages/{page_number}/
Fall back to legacy: If not found, check pages/{page_number}/
Cache lookup result: Avoid repeated filesystem checks

Write Operations (Selective):

New projects: Write to files/{file_id}/pages/ only
Legacy projects: Continue writing to pages/ until upgraded
Upgraded projects: Write to files/{file_id}/pages/ only

Migration States

Projects exist in one of three states:

State	`pages/`	`files/`	`plan-metadata.json`	Behavior
Legacy	✅ Present	❌ Absent	✅ Present	Read from `pages/`, write to `pages/`
Transitional	✅ Present	✅ Present	✅ Present	Read from `files/` first, fall back to `pages/`
Modern	❌ Empty/Absent	✅ Present	✅ Present (for compat)	Read/write to `files/` only

Detection Logic

public enum ProjectStructureVersion {
  LEGACY,      // Only has pages/
  TRANSITIONAL, // Has both pages/ and files/
  MODERN       // Only has files/
}

public ProjectStructureVersion detectProjectVersion(String projectId) {
  boolean hasLegacyPages = fileSystemHandler.exists("projects/" + projectId + "/pages/");
  boolean hasFiles = fileSystemHandler.exists("projects/" + projectId + "/files/");
  
  if (hasFiles && !hasLegacyPages) return MODERN;
  if (hasFiles && hasLegacyPages) return TRANSITIONAL;
  return LEGACY;
}

Path Resolution with Fallback

public String resolvePageFolderPath(String projectId, String fileId, int pageNumber) {
  // Try new structure first
  String newPath = String.format("projects/%s/files/%s/pages/%03d", projectId, fileId, pageNumber);
  if (fileSystemHandler.exists(newPath)) {
    return newPath;
  }
  
  // Fall back to legacy structure
  String legacyPath = String.format("projects/%s/pages/%03d", projectId, pageNumber);
  if (fileSystemHandler.exists(legacyPath)) {
    logger.info("Using legacy page path for project {}, page {}", projectId, pageNumber);
    return legacyPath;
  }
  
  throw new PageNotFoundException(projectId, pageNumber);
}

Ensuring Zero Breakage

Critical Guarantee: No existing functionality breaks

All read operations have fallback logic
Legacy projects write to old structure (no forced migration)
Integration tests validate both structures
Gradual rollout with feature flags
Rollback plan if issues detected

User Stories

Story 1: View File Metadata in Project Settings

As a project owner
I want to view rich metadata about each input file in my project
So that I can understand what documents I've uploaded and their processing status

Acceptance Criteria:

Project settings page shows list of input files
Each file displays: name, document type, page count, upload date, processing status
Files can be sorted by date, name, or document type
File metadata is retrieved via ListInputFileMetadata RPC

Story 2: Upgrade Legacy Project to New Structure

As a project owner
I want to upgrade my legacy project to use the new file structure
So that I can benefit from rich file metadata and better organization

Acceptance Criteria:

When opening a legacy project, user sees an informational banner
Banner message: "Upgrade your project to the new file structure for better organization and metadata."
Banner has "Upgrade Project" button
Clicking button shows upgrade dialog with:
- Explanation of benefits
- List of files that will be processed
- Estimated time and cost
- "Start Upgrade" and "Cancel" buttons
Upgrade process:
- Analyzes files in inputs/ folder
- Generates file metadata for each file
- Associates existing pages with source files (best effort)
- Migrates pages to files/{file_id}/pages/ structure
- Preserves legacy pages/ folder for rollback
- Updates plan-metadata.json to reference new structure
On success, banner disappears and file metadata becomes visible
User can dismiss banner, but it reappears on next visit until project is upgraded
Upgrade is optional - legacy projects continue to work without upgrade

Story 3: Automatic Metadata for New Uploads

As a project owner
I want newly uploaded files to automatically get rich metadata
So that I don't have to manually classify or organize them

Acceptance Criteria:

When uploading a new PDF file, system automatically:
- Generates unique file ID
- Creates files/{file_id}/ folder structure
- Extracts file metadata (size, page count, checksum)
- Classifies document type using LLM (e.g., "Architectural Plan")
- Generates AI summary of file contents
- Saves metadata to files/{file_id}/metadata.json
File appears in project settings with full metadata
Pages are processed into files/{file_id}/pages/ structure
Legacy plan-metadata.json is updated for backward compatibility

As a project reviewer
I want to view pages organized hierarchically by source file in the table of contents
So that I can easily navigate pages by discipline and understand which pages came from which document

Acceptance Criteria: ✅ ALL MET

✅ Table of contents (TOC) displays pages in a hierarchical tree structure:

📄 architectural-plans.pdf (Architectural Plan | ID: 1) - 15 pages
  └─ Page 1: First Floor Plan
  └─ Page 2: Second Floor Plan
  └─ ...
📄 electrical-plans.pdf (Electrical Plan | ID: 2) - 8 pages
  └─ Page 1: Electrical Panel Schedule
  └─ Page 2: Lighting Plan
  └─ ...

✅ Files are displayed as collapsible parent items using Angular Material expansion panels
✅ Each file shows:
- ✅ File name with fade-out truncation for long names
- ✅ Document type and file ID in subtitle format
- ✅ Red PDF icon for visual consistency
- ✅ Expand/collapse chevron icon with proper spacing
- ✅ Enhanced visual emphasis (background color, border, drop shadow)
✅ Pages are nested under their parent file without excessive indentation
✅ Clicking file header toggles expand/collapse of all pages in that file
✅ Selected page is highlighted only in the correct file (file-aware highlighting)
✅ File-aware URLs: /files/{file_id}/pages/{page_number}/{tab}
✅ Works for both legacy and modern projects:
- Legacy: Shows flat structure (backward compatibility)
- Modern: Shows hierarchical file structure
- Transitional: Shows hierarchical structure with dual-read support

Story 5: Admin Bulk Upgrade

As a system administrator
I want to bulk upgrade multiple legacy projects to the new structure via API
So that I can ensure consistency and enable new features system-wide

Acceptance Criteria:

API Requirements (Required):

gRPC RPC: MigrateProjectFileStructure with support for:
- Single project migration
- Dry-run mode (preview without applying)
- Preserve legacy structure option
REST API endpoint via gRPC-Gateway/ESPv2:
- POST /v1/architectural-plans/{project_id}/migrate-file-structure
- JSON request body with dry_run and preserve_legacy_structure options
Operation is idempotent (safe to call multiple times)
Returns detailed migration result with success/failure status
Proper error handling and validation

CLI Tool (Required):

Admin can identify legacy projects: ./cli/codeproof.sh list-legacy-projects --user-id=ADMIN
Admin can bulk upgrade: ./cli/codeproof.sh upgrade-file-structure --user-id=ADMIN --dry-run=true
Command supports:
- --dry-run: Preview changes without applying
- --project-ids: Specific projects to upgrade (comma-separated)
- --all: Upgrade all legacy projects for user
- --concurrency: Number of parallel upgrades (default: 1)
CLI calls the gRPC API internally
Logs success/failure for each project

Admin UI (Optional - Nice to Have):

/admin page with legacy project management
List of all legacy projects with upgrade status
Bulk selection and upgrade actions
Progress tracking for batch operations
Migration history and logs

Operational Requirements:

Existing project functionality is not disrupted during migration
Users can still access their projects during upgrade
Migration logs include timestamp, initiator, and results for audit trail

Technical Design

📄 See: Technical Design Document

The detailed technical design, including backend implementation, migration algorithms, frontend components, and CLI tools, has been moved to a separate Technical Design Document (TDD) for better organization.

Key Technical Components:

Backend Services:
- InputFileMetadataService - Generate and manage file metadata
- FileStructureMigrationService - Migrate legacy projects to new structure
- ProjectPathResolver - Transparent path resolution across legacy and modern structures
gRPC RPCs:
- GenerateInputFileMetadata - Create metadata for uploaded files
- GetInputFileMetadata - Retrieve file metadata
- ListInputFileMetadata - List all files with metadata
- MigrateProjectFileStructure - Upgrade legacy project (admin/user-initiated)
Frontend Components:
- FileMetadataListComponent - Display file list with rich metadata (project settings)
- PageTocHierarchicalComponent - Hierarchical page navigation with collapsible files (TOC sidebar)
- LegacyProjectUpgradeBannerComponent - Prompt users to upgrade
- FileStructureMigrationDialogComponent - User-initiated upgrade workflow
CLI Tools:
- UpgradeFileStructureCommand - Bulk upgrade for admins
- AnalyzeLegacyProjectsCommand - Identify projects needing upgrade

For complete implementation details, refer to the TDD.

Success Metrics

User Adoption

% of legacy projects upgraded within 3 months
% of users who use hierarchical navigation features
User feedback on file organization improvements

System Health

Zero incidents caused by backward compatibility issues
RPC latency for file metadata operations (target: < 500ms)
Success rate of file structure migrations (target: > 99%)

Data Quality

% of files with complete metadata
% of files correctly classified by document type
% of pages correctly associated with source files

Risks and Mitigations

Risk	Impact	Likelihood	Mitigation
Breaking existing projects	Critical	Medium	Comprehensive dual-read fallback logic, extensive backward compatibility tests
Data loss during migration	Critical	Low	Keep legacy `pages/` intact during migration, enable rollback
Performance degradation	High	Medium	Efficient path caching, lazy metadata loading, index file structure
Incorrect page-to-file associations	Medium	Medium	Best-effort heuristics, allow manual corrections, validate with integration tests
Users confused by upgrade process	Medium	High	Clear UI messaging, optional upgrade, admin can force if needed
Migration fails mid-process	High	Low	Transactional migrations, checkpoints, retry logic, rollback capability
Inconsistent metadata across files	Low	Medium	Schema validation, default values, metadata regeneration tool

Non-Goals

Automatic forced migrations: Users and admins control when projects are upgraded
Reorganizing inputs/ folder: Raw uploaded files remain in flat inputs/ structure
Editing file metadata through UI: Phase 1 focuses on read-only display (edit in future)
Advanced file operations: Move, rename, merge files (future enhancements)
Multi-file upload coordination: Upload files one at a time (batch upload is future work)
Real-time progress tracking: File processing happens asynchronously (task tracking in Issue #XX)

Future Enhancements

Admin UI for Bulk Operations:
- Web-based /admin page for legacy project management
- Visual list of all legacy projects with upgrade status
- Bulk selection and upgrade actions
- Progress tracking for batch operations
- Migration history and audit logs
Advanced File Management:
- Move pages between files
- Merge multiple files
- Split files by document type
- Delete files and associated pages
Enhanced Metadata:
- Edit file metadata through UI
- Custom tags and labels
- File versioning and history
- Automatic re-classification
Batch Operations:
- Multi-file upload with coordination
- Bulk reprocessing
- Bulk document type re-classification
- Scheduled upgrades for legacy projects
Advanced Search and Filtering:
- Full-text search across file metadata and page content
- Filter by document type (Architectural, Electrical, etc.)
- Filter by processing status or date range
- Search by content summary
- Saved search queries
Analytics:
- File processing time metrics
- Document type distribution reports
- Storage usage by file
- Page extraction success rates

Open Questions

File ID Generation: Use UUID v4, timestamp-based, or auto-increment?
- Answer: Auto-increment integers (1, 2, 3...) for readable URLs and simplicity
- Rationale:
  - ✅ Shortest possible IDs: files/1/, files/2/, etc.
  - ✅ Human-readable and easy to reference ("File #1")
  - ✅ Chronological by upload order
  - ✅ Simple to implement with project-level counter in files/index.json
- Implementation: Maintain next_file_id counter in projects/{projectId}/files/index.json
- Alternative considered: {id}-{filename-slug} hybrid (e.g., 1-architectural-plans) for even more readability, but adds complexity
Metadata Generation Timing: Generate metadata immediately on upload or asynchronously?
- Answer: Hybrid - basic metadata immediately, AI analysis (document type, summary) asynchronously
Legacy plan-metadata.json: Keep updating it for backward compatibility or deprecate immediately?
- Answer: Keep updating indefinitely for maximum backward compatibility
Page Number Continuity: Should page numbers be global (001, 002...) or per-file (file1/001, file2/001)?
- Answer: Per-file for better organization. Global overview uses {file_id}_{page_number} composite IDs.
Migration Rollback: How long to keep legacy pages/ folder after successful migration?
- Answer: Keep indefinitely (disk space is cheap, safety is paramount)
Document Type Classification: Use AI (expensive, accurate) or heuristics (cheap, less accurate)?
- Answer: Start with heuristics (filename, page count), add AI classification as optional enhancement

Project Metadata Management PRD: Complementary project-level metadata
Developer Playbook: Build and deployment workflows
Protocol Buffers & gRPC Best Practices: Proto-first design
Copy Project Utility: Cross-environment project operations

Issue #227: Project Metadata Management
- Relationship: Complementary - Issue #227 covers project-level metadata, this PRD covers file-level metadata
- Implementation Order: Issue #227 first (simpler), then this PRD
Issue #117: Multi-tenant Support
- Relationship: Builds on multi-tenant structure proposed in #117
- Status: Closed - multi-tenant structure already implemented

References

Protocol Buffer Definitions: src/main/proto/api.proto
InputFileMetadata Proto: src/main/proto/api.proto (see InputFileMetadata message)
ArchitecturalPlanReviewer: src/main/java/org/codetricks/construction/code/assistant/ArchitecturalPlanReviewer.java
File Storage Handler: src/main/java/org/codetricks/construction/code/assistant/FileSystemHandler.java

Executive Summary​

Problem Statement​

Current State​

Problems with Current Structure​

User Impact​

Relationship to Issue #227 (Project Metadata)​

Proposed Solution​

New Project Structure​

File Metadata Schema​

Implementation Phases​

Phase 1: Infrastructure & Dual-Read Support (Backward Compatible) ✅ COMPLETED​

Phase 2: Frontend Integration & User Migration Tools ✅ COMPLETED​

Phase 3: New File Processing Pipeline ✅ COMPLETED​

Phase 4: Migration & Deprecation (Optional Future)​

✨ Additional Features Implemented​

Multi-File UI Enhancements ✅ COMPLETED​

Backend Robustness ✅ COMPLETED​

Developer Experience ✅ COMPLETED​

Backward Compatibility Strategy​

Core Principle: Dual-Read, Selective Write​

Migration States​

Detection Logic​

Path Resolution with Fallback​

Ensuring Zero Breakage​

User Stories​

Story 1: View File Metadata in Project Settings​

Story 2: Upgrade Legacy Project to New Structure​

Story 3: Automatic Metadata for New Uploads​

Story 4: Hierarchical Page Navigation by Source File ✅ COMPLETED​

Story 5: Admin Bulk Upgrade​

Technical Design​

Success Metrics​

User Adoption​

System Health​

Data Quality​

Risks and Mitigations​

Non-Goals​

Future Enhancements​

Open Questions​

Related Documentation​

Related Issues​

References​

Executive Summary

Problem Statement

Current State

Problems with Current Structure

User Impact

Relationship to Issue #227 (Project Metadata)

Proposed Solution

New Project Structure

File Metadata Schema

Implementation Phases

Phase 1: Infrastructure & Dual-Read Support (Backward Compatible) ✅ COMPLETED

Phase 2: Frontend Integration & User Migration Tools ✅ COMPLETED

Phase 3: New File Processing Pipeline ✅ COMPLETED

Phase 4: Migration & Deprecation (Optional Future)

✨ Additional Features Implemented

Multi-File UI Enhancements ✅ COMPLETED

Backend Robustness ✅ COMPLETED

Developer Experience ✅ COMPLETED

Backward Compatibility Strategy

Core Principle: Dual-Read, Selective Write

Migration States

Detection Logic

Path Resolution with Fallback

Ensuring Zero Breakage

User Stories

Story 1: View File Metadata in Project Settings

Story 2: Upgrade Legacy Project to New Structure

Story 3: Automatic Metadata for New Uploads

Story 4: Hierarchical Page Navigation by Source File ✅ COMPLETED

Story 5: Admin Bulk Upgrade

Technical Design

Success Metrics

User Adoption

System Health

Data Quality

Risks and Mitigations

Non-Goals

Future Enhancements

Open Questions

Related Documentation

Related Issues

References