Skip to main content

PermitProofDocumentation

Home
Developer Playbooks
DevOps
Product Requirements
Technical Design Documents
User Interface
Partner Integrations
Access Control & RBAC
Background Tasks
Logging
User Guides
Testing
Wiki Maintenance
Release Notes
issues

Testing
Chatbot Test Prompts & Evaluation Cases

Chatbot Test Prompts & Evaluation Cases

Overview

This document tracks test prompts used during ad-hoc testing of the chatbot feature. These prompts will serve as the basis for future automated integration tests and evaluation metrics.

Purpose

Capture real-world user queries encountered during development
Document expected behavior for each prompt
Track edge cases and failure modes
Enable reproducible testing across iterations
Feed automated test suites and evaluation frameworks

Test Case Format

Each test case should include:

Prompt: The exact user query
Test Project: Which project/files to use
Expected Behavior: What the bot should do
Verification Points: Specific things to check
Status: ✅ Pass, ❌ Fail, ⏳ Pending
Notes: Additional context or issues found

Test Cases

TC-001: Page Reference Query - Second Floor

Prompt: "Which files and pages have information about the Second Floor?"

Test Project: San Jose Sonora (3 files)

Expected Behavior:

Bot should identify all files containing "Second Floor" references
Bot should provide specific page numbers where information appears
Response decorator should add clickable links to the referenced pages

Verification Points:

Response mentions all relevant files
Page numbers are accurate
Links are properly formatted and clickable
Links navigate to correct page in viewer
Response is concise and well-organized

Status: ⏳ Pending

Notes:

This tests basic document search and reference extraction
Tests the response decorator's link generation
Critical for user experience - links must work correctly

TC-002: Code Violations Check - Context-Aware

Prompt: "Do we have any violations of the code on this page?"

Test Project: San Jose Sonora (multi-file) Test Page: File 2, Page 3 (Compliance tab with existing reports)

Expected Behavior:

Bot should call GetAvailableAnalysis to check existing reports FIRST
Bot should inform user about existing analysis (e.g., "CBC 2022: 3 violations found")
Bot should list the violations or summarize findings
Bot should ask if user wants analysis for other codes
Bot should NOT run expensive analysis without confirmation

Verification Points:

Calls GetAvailableAnalysis API first
Mentions existing book (e.g., "CBC 2022")
Reports violation count accurately
Provides details about violations found
Asks before running new expensive analysis
Does NOT call StartPageSectionComplianceReportTask without confirmation

Status: ⏳ Pending

Notes:

Tests cost-awareness and existing analysis detection
Critical for avoiding redundant expensive operations
Should reference screenshot showing existing compliance reports
Agent must be smart about not re-running already-completed analysis

Test Categories

1. Document Search & Reference

Tests that verify the bot can find and reference specific information in documents.

Examples:

TC-001: Page Reference Query - Second Floor

2. Multi-File Analysis

Tests that require synthesizing information across multiple files.

Examples:

(To be added)

3. Technical Detail Extraction

Tests that require extracting specific technical details (dimensions, materials, codes).

Examples:

(To be added)

4. Comparative Analysis

Tests that require comparing information across sections or documents.

Examples:

(To be added)

5. Code Compliance Questions

Tests related to building codes, regulations, and compliance.

Examples:

TC-002: Code Violations Check - Context-Aware

6. Clarification & Ambiguity

Tests with ambiguous queries that require clarification or intelligent interpretation.

Examples:

(To be added)

7. Edge Cases & Error Handling

Tests for unusual inputs, missing data, or error conditions.

Examples:

(To be added)

Adding New Test Cases

When adding a new test case:

Assign a unique ID (TC-XXX format)
Use the exact prompt you tested with
Document the test project and its characteristics (number of files, size, etc.)
Be specific about expected behavior - what should happen?
List concrete verification points - how do you know it worked?
Update the status as you test
Add notes about interesting findings, bugs, or improvements needed
Categorize the test case appropriately

Future Automation

This document will be used to generate:

Jasmine/Jest integration tests for frontend chat interactions
JUnit tests for backend RAG pipeline
Evaluation metrics (precision, recall, link accuracy)
Regression test suite for releases
Performance benchmarks (response time, token usage)

Test Data Requirements

For automated testing, we'll need:

Sample projects with known content (ground truth)
Expected response templates or validation rules
Link validation test harness
Response quality scoring rubric
Performance baselines

Changelog

2025-10-27: Initial document created with TC-001
2025-10-27: Added TC-002 - Code violations check with context awareness

UI Testing with Cypress

Static Code Analysis Overview

Overview
Purpose
Test Case Format
Test Cases
- TC-001: Page Reference Query - Second Floor
- TC-002: Code Violations Check - Context-Aware
Test Categories
Adding New Test Cases
Future Automation
Test Data Requirements
Changelog

Copyright © 2026 PermitProof. Built with Docusaurus.