Skip to main content

Declarative Provisioning (Terraform)

Deployment Scripts

We provide a wrapper script to simplify Terraform deployments and handle secret injection automatically:

./cli/sdlc/cloud-run-grpc/terraform-deploy.sh [env]

This script:

  1. Validates the environment.
  2. Sources secrets from env/${ENV}/firebase/m3/setvars.secrets.sh (or setvars.secrets.sh), injecting TF_VAR_ environment variables.
  3. Runs terraform plan and asks for confirmation.
  4. Runs terraform apply.

Usage:

./cli/sdlc/cloud-run-grpc/terraform-deploy.sh stg

Declarative Environment Provisioning (Terraform)

This guide details the modern, declarative approach to provisioning environments using Terraform. This method is preferred for Staging (stg) and Production comparisons (test is also managed via Terraform).

Overview

We use a modular Terraform structure to manage infrastructure as code (IaC). This ensures consistency, reproducibility, and drift detection.

Architecture Overview

Our GCP infrastructure follows a hub-and-spoke pattern with centralized shared services and isolated environment projects:

┌─────────────────────────────────────────────────────────────────────────────┐
│ SHARED INFRASTRUCTURE │
├─────────────────────────────────┬───────────────────────────────────────────┤
│ construction-code-expert-admin │ construction-code-expert-repo │
│ ───────────────────────────── │ ──────────────────────────── │
│ • Terraform state buckets │ • Central Artifact Registry │
│ • Billing/quota project │ • Docker images (gRPC backend, ESPv2) │
│ • Cloud Build API │ • Shared across all environments │
│ • Bootstrap scripts │ │
└─────────────────────────────────┴───────────────────────────────────────────┘

┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ construction-code- │ │ construction-code- │ │ construction-code- │
│ expert-dev │ │ expert-stg │ │ expert-test │
│ ───────────────── │ │ ───────────────── │ │ ───────────────── │
│ • Cloud Run │ │ • Cloud Run │ │ • Cloud Run │
│ • Firestore │ │ • Firestore │ │ • Firestore │
│ • GCS Buckets │ │ • GCS Buckets │ │ • GCS Buckets │
│ • Firebase │ │ • Firebase │ │ • Firebase │
│ • Secrets │ │ • Secrets │ │ • Secrets │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
ENVIRONMENT ENVIRONMENT ENVIRONMENT

Centralized Projects

ProjectPurposeKey Resources
construction-code-expert-adminInfrastructure managementTerraform state buckets (gs://construction-code-expert-tf-state-{env}), billing/quota for gcloud commands, Cloud Build API
construction-code-expert-repoArtifact storage & Build FactoryCentral Artifact Registry (custom-docker-image-repo), Cloud Build execution environment

Centralized Build Factory Pattern

We strictly separate Build concerns from Runtime concerns.

  • Builds run in the -repo project.
  • Runtime (Cloud Run) runs in -dev, -stg, -test.

Why?

  1. Supply Chain Security: The -repo project owns the "software factory". It creates signed artifacts.
  2. Simplified IAM: The Cloud Build Service Account in -repo automatically has push access to the Artifact Registry in -repo.
  3. Auditability: All build logs are centralized.

Manual Setup Required: Because -repo and -admin projects are long-lived and established before the environment Terraform runs, certain APIs must be enabled manually if not already present:

# Enable proper APIs on the repo project to support builds
gcloud services enable cloudbuild.googleapis.com --project=construction-code-expert-repo
gcloud services enable artifactregistry.googleapis.com --project=construction-code-expert-repo

Why This Pattern?

  1. State Isolation: Each environment has its own Terraform state bucket, preventing accidental cross-environment changes
  2. Centralized Artifacts: Docker images are built once and promoted through environments (test → stg → prod)
  3. Billing Control: The admin project handles API quota/billing, solving the "chicken-and-egg" problem when bootstrapping new projects
  4. Access Control: Environment projects can be granted different IAM permissions without affecting shared infrastructure

Cross-Project Permissions

When provisioning a new environment, the following cross-project permissions are required:

# Grant Cloud Run Service Agent access to pull images from central repo
gcloud artifacts repositories add-iam-policy-binding custom-docker-image-repo \
--project=construction-code-expert-repo \
--location=us-central1 \
--member="serviceAccount:service-NEW_PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com" \
--role="roles/artifactregistry.reader"

Directory Structure

  • terraform/live/: Contains environment-specific configurations (e.g., dev, stg, test).
    • main.tf: The primary entry point, instantiating the environment module.
    • outputs.tf: exposing key resource IDs (e.g., espv2_uri, web_api_key).
    • variables.tf: Environment-specific inputs.
    • backend.tf: GCS backend configuration for state storage.
  • terraform/modules/: Reusable modules.
    • environment/: The core module that bundles Project, Storage, Cloud Run, and Firebase resources.
    • project/: Project creation and API enablement.
  • terraform/shared_vars.yaml: centralized configuration for project IDs, regions, and image tags.

Prerequisites

  1. Terraform: v1.5+ installed.
    • macOS (Homebrew):
      brew tap hashicorp/tap
      brew install hashicorp/tap/terraform
    • DevContainer / Linux (Debian/Ubuntu):
      wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
      echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
      sudo apt update && sudo apt install terraform
  2. GCP Access: Owner or Editor role on the GCP Project (or folder).
    • Note: This applies to the Application Default Credentials (ADC) of the entity running Terraform (e.g., your personal account on the host machine, or the Service Account inside the DevContainer).
  3. Admin Setup: Ensure the bootstrap scripts have run (if creating a fresh project).

Step-by-Step Usage

1. Initialize

Navigate to the specific environment directory:

cd terraform/live/stg
terraform init

Tip: Use terraform init -upgrade if you see errors about provider version mismatches or if you want to update to the latest allowed versions of providers/modules.

2. Plan

Review the changes Terraform will make. This step is critical to avoid unintended destruction of resources.

terraform plan

3. Apply

Provision the resources.

terraform apply

Provisioning a New Environment

This playbook covers end-to-end provisioning of a completely new environment (e.g., phil, crsr).

Phase 1: Terraform Configuration

Recommended: Use the Automation Script

We provide a script to scaffold the new environment from a standard template. This ensures all naming conventions and boilerplates are correct.

# Basic usage (defaults to 'construction-code-expert' prefix)
./cli/utils/create-terraform-env.sh ENVNAME

# Custom project prefix
./cli/utils/create-terraform-env.sh ENVNAME --project-prefix=my-custom-prefix

This will:

  1. Create terraform/live/ENVNAME.
  2. Populate it with main.tf, backend.tf, terraform.tfvars, etc.
  3. Replace placeholders ({{ENV}}, {{PROJECT_PREFIX}}) with your values.

Manual Steps (if script is not used):

  1. Create Terraform state bucket (in admin project):

    gsutil mb -p construction-code-expert-admin -l us-central1 \
    gs://construction-code-expert-tf-state-ENV_NAME
  2. Create environment directory:

    mkdir -p terraform/live/ENV_NAME
  3. Copy and customize configuration:

    • Copy contents of terraform/templates/env-app/ to terraform/live/ENVNAME/
    • Replace {{ENV}}, {{PROJECT_PREFIX}}, and {{BILLING_ACCOUNT}} with real values in main.tf, backend.tf, and terraform.tfvars.
  4. Key terraform.tfvars settings:

    env_suffix = "newenv"
    billing_account = "018A1F-2219A5-D47906"
    enable_stripe = false # Unless Stripe is configured
    enable_google_group_allowlist_check = false # For dev/agent envs
    grpc_max_instance_count = 10 # Lower for dev envs
    hierarchical_namespace_enabled = true
    gcs_cors_allow_localhost = true # For local development

Phase 2: Brownfield Imports (Optional)

If resources were created imperatively before Terraform adoption, create imports.tf:

# Import existing GCP project
import {
id = "construction-code-expert-ENVNAME"
to = module.environment.module.project.google_project.main
}

# Import existing service account
import {
id = "projects/construction-code-expert-ENVNAME/serviceAccounts/cce-app-service@construction-code-expert-ENVNAME.iam.gserviceaccount.com"
to = module.environment.google_service_account.app_service_account
}

# Import existing APIs (example)
import {
id = "construction-code-expert-ENVNAME/firestore.googleapis.com"
to = module.environment.module.project.google_project_service.enabled_apis["firestore.googleapis.com"]
}

Tip: Run terraform plan first. If Terraform tries to create a resource that already exists, the error message will tell you the import ID.

Phase 3: Environment Shims (Simplified)

Modern deployment scripts automatically fallback to env/common shims if environment-specific ones are not found. This means you do not need to create manual shim files in env/ENVNAME/.

The scripts will automatically delegate to env/common and load configuration using the environment name passed as an argument (e.g., ./deploy.sh gcli).

Custom Overrides (Optional): Only create specific shims in env/ENVNAME/ if you need to explicitly override the standard behavior or values provided by Terraform.

Legacy: Manual Shim Creation (Historical Reference)

If you are maintaining older environments or need to manually configure shims, here is the procedure:

Create shim scripts that source configuration from Terraform outputs:

  1. Root shim (env/ENVNAME/setvars.sh):

    #!/bin/bash
    # SHIM: Sources dynamic configuration from Terraform outputs
    REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
    source "$REPO_ROOT/env/terraform-sourced/load.sh" ENVNAME
  2. Per-service shims (create in env/ENVNAME/gcp/cloud-run/{grpc,endpoints,websocket}/setvars.sh):

    #!/bin/bash
    source "$REPO_ROOT/env/terraform-sourced/load.sh" ENVNAME
  3. Firebase shim (env/ENVNAME/firebase/m3/setvars.sh):

    #!/bin/bash
    source "$REPO_ROOT/env/terraform-sourced/load.sh" ENVNAME

    # Map Terraform outputs to Angular environment variables
    export CODEPROOF_API_SERVER="${ESPV2_URI#https://}"
    export CODEPROOF_WS_SERVER="${WEBSOCKET_URI#https://}"

Phase 4: Frontend Configuration

Add Firebase hosting target to web-ng-m3/.firebaserc:

{
"targets": {
"construction-code-expert-ENVNAME": {
"hosting": {
"ENVNAME": [
"construction-code-expert-ENVNAME"
]
}
}
}
}
  1. Add hosting target to web-ng-m3/firebase.json:
{
"target": "ENVNAME",
"public": "dist",
"ignore": [
"firebase.json",
"**/.*",
"**/node_modules/**"
],
"rewrites": [
{
"source": "**",
"destination": "/index.html"
}
]
}

Phase 5: Apply and Deploy

  1. Initialize and apply Terraform:

    terraform -chdir=terraform/live/ENVNAME init
    terraform -chdir=terraform/live/ENVNAME apply
  2. Create OAuth credentials (see Manual Steps section below)

  3. Deploy backend services:

    cli/sdlc/cloud-run-grpc/deploy.sh ENVNAME
    cd env && ./deploy-websocket.sh ENVNAME && cd ..
    cli/sdlc/cloud-run-job/deploy.sh ENVNAME
  4. Deploy frontend:

    cd web-ng-m3 && ./deploy.sh ENVNAME && cd ..

    ⚠️ IMPORTANT: The frontend deploy script (web-ng-m3/deploy.sh) MUST be run from within the Devcontainer to avoid esbuild platform mismatch errors (See Issue #399).

Managed Resources

Terraform currently manages:

  1. Google Cloud Project: Labels, API enablement (e.g., run.googleapis.com, aiplatform.googleapis.com).
  2. IAM: Service Account creation and role assignments (web-api-key secret access, Cloud Run invoker).
  3. Cloud Run Services:
    • default_grpc_backend: The main gRPC application.
      • Note: Uses an environment-agnostic Docker image (promoted from Test -> Staging -> Prod).
    • espv2: The Envoy proxy for gRPC-Web transcoding (Infrastructure Shell).
      • Note: Uses an environment-specific image (baked with the specific backend URL).
    • websocket: The real-time chat service.
  4. Firebase: Project linking and Web App creation.
  5. Secrets: API Keys (web_api_key, maps_api_key) and Stripe keys.

Backend Service Promotion Strategy

The default_grpc_backend uses a centralized image catalog defined in terraform/shared_vars.yaml. This allows for safe, explicit promotion of immutable artifacts.

To promote an image (e.g., Test -> Staging):

  1. Identify the Image:
    • Find the verified image tag currently running in test (or from the CI/CD build output).
    • Example: 1cc4057
  2. Update Catalog:
    • Edit terraform/shared_vars.yaml.
    • Update image_catalog.grpc_backend.tags.stg_current with the new tag.
    image_catalog:
    grpc_backend:
    tags:
    test_current: "1cc4057" # Verified
    stg_current: "1cc4057" # Promoting this tag
  3. Deploy:
    • Run terraform apply in the target environment (terraform/live/stg).
    • Terraform will detect the configuration change and redeploy the Cloud Run service with the new image.

⚠️ Manual Steps & Caveats

While Terraform handles the bulk of the infrastructure, some steps remain manual or imperative due to provider limitations or security constraints:

1. Firebase/GCP OAuth Credentials

After the first terraform apply, you must create OAuth credentials for Firebase Google Sign-In:

  1. Create OAuth Consent Screen:

  2. Create OAuth Client ID:

    • Go to GCP Console → APIs & Services → Credentials
    • Create OAuth client IDWeb application
    • Authorized JavaScript origins:
      • https://construction-code-expert-ENVNAME.web.app
    • Authorized redirect URIs:
      • https://construction-code-expert-ENVNAME.firebaseapp.com/__/auth/handler
  3. Store Client Secret:

    echo -n "YOUR_CLIENT_SECRET" | gcloud secrets versions add firebase-login-oauth2-client-secret \\
    --project=construction-code-expert-ENVNAME --data-file=-
  4. Update Terraform:

    • Add to terraform.tfvars:
      firebase_login_oauth2_client_id = "YOUR_CLIENT_ID.apps.googleusercontent.com"
    • Re-run terraform apply to enable Google Sign-In in Identity Platform

2. Service Account Domain-Wide Delegation

Granting the service account identifying rights to check Google Workspace group membership requires Google Workspace Admin privileges and is not done via GCP Terraform.

  • Action: Manually configure Domain-Wide Delegation in the Google Admin Console if utilizing group-based RBAC.

3. ESPv2 Image Build

Unlike the backend, the ESPv2 Proxy image is environment-specific because it contains the baked-in Open API configuration which points to the specific backend URL for that environment. We use an "Infrastructure Shell" pattern where Terraform provisions the Cloud Run service with a placeholder first to generate the URL.

To build and apply a new ESPv2 image:

  1. Prerequisites:

    • Ensure Terraform has applied successfully (so the backend service exists and has a URL).
    • Ensure you are authenticated with gcloud and docker.
  2. Run Build Script:

    • Execute the helper script from the env directory:
    cd env
    ./build-espv2-image.sh stg
    • What this does:
      1. Downloads Service & Google APIs.
      2. Deploys Cloud Endpoints configuration.
      3. Builds a new Docker image with the specific backend URL.
      4. Promotes the image to the Central Artifact Registry.
  3. Update Terraform (Optional but Recommended):

    • The script outputs the new image URI (e.g., ...:2024-01-01r0).
    • Update terraform/live/stg/main.tf if you are pinning specific versions.
    • Note: Currently, Terraform may use a placeholder or specific tag. If the tag changes, run terraform apply.

4. Database Indexes (Firestore)

Firestore composite indexes are defined in web-ng-m3/firestore.indexes.json and must be deployed via the Firebase CLI. They are not managed by Terraform.

Action: Deploy indexes to each environment after making changes:

cd web-ng-m3

# Deploy to all environments
firebase deploy --only firestore:indexes --project construction-code-expert-dev
firebase deploy --only firestore:indexes --project construction-code-expert-test
firebase deploy --only firestore:indexes --project construction-code-expert-stg

Note: Index builds may take 2-5 minutes. Check the Firebase Console for build status.

The OAuth Consent Screen must be configured manually before Identity Platform Google Sign-In can work.

  • Action: Go to GCP Console → APIs & Services → OAuth consent screen
  • Configure: App name, Support email (can use a Google Group like info@permitproof.com), Authorized domains
  • Note: To use a Google Group email, the GCP project owner must have manager privileges on that group

6. Secret Values (Stripe, etc.)

Terraform creates secret containers but not the actual values (versions). Secrets must be populated separately.

  • Action: Use gcloud secrets versions add or the provision-secrets script
  • Placeholder workaround (for staging without real keys):
    echo -n "sk_test_placeholder_not_configured" | gcloud secrets versions add stripe-secret-key \
    --project=construction-code-expert-stg --data-file=-

🔧 Common Issues & Troubleshooting

API Enablement Race Condition

Symptom: First terraform apply fails with "API has not been used in project... or it is disabled" errors

Cause: GCP API enablement is eventually consistent. Terraform enables the API but doesn't wait long enough before creating resources that depend on it.

Fix: Simply re-run terraform apply:

terraform -chdir=terraform/live/ENVNAME apply

The API should be fully propagated on the second run.

Note: This commonly affects Secret Manager, Firestore, and Identity Platform on new environments.

Cross-Project Artifact Registry Access

Symptom: Cloud Run deployment fails with Permission "artifactregistry.repositories.downloadArtifacts" denied

Cause: New environments need their Cloud Run Service Agent granted access to the central Artifact Registry.

Fix:

gcloud artifacts repositories add-iam-policy-binding custom-docker-image-repo \
--project=construction-code-expert-repo \
--location=us-central1 \
--member="serviceAccount:service-PROJECT_NUMBER@serverless-robot-prod.iam.gserviceaccount.com" \
--role="roles/artifactregistry.reader"

Note: Replace PROJECT_NUMBER with the new environment's project number (e.g., 381589830306 for stg).

Tainted Resource Recovery

Symptom: After a failed deployment, Terraform wants to destroy/recreate a resource but deletion_protection=true blocks it.

Fix:

terraform untaint 'module.environment.google_cloud_run_v2_service.default_grpc_backend'

Force New Revision After Permission Fix

Symptom: After fixing permissions, the Cloud Run service still shows the old error (revision stuck in failed state).

Fix: Force a new revision deployment:

gcloud run services update SERVICE_NAME \
--region=us-central1 \
--project=PROJECT_ID \
--update-labels=force-redeploy=$(date +%s)

Backend Configuration Changed

Symptom: terraform init fails with "Backend configuration changed"

Fix: Use -reconfigure to reinitialize with the new backend:

terraform init -reconfigure

Cloud Build API on Admin Project

Symptom: gcloud builds submit fails or hangs asking to enable API on admin project

Cause: The admin project is used as the billing/quota project for gcloud commands.

Fix:

gcloud services enable cloudbuild.googleapis.com --project=construction-code-expert-admin

Image Promotion Hanging (gcloud container images add-tag)

Symptom: The build-espv2-image.sh script hangs at the promotion step despite --quiet flag.

Cause: gcloud container images add-tag has issues with interactive prompts when copying between GCR and Artifact Registry.

Fix: The script now uses Docker pull/tag/push instead:

gcloud auth configure-docker us-central1-docker.pkg.dev,gcr.io --quiet
docker pull "SOURCE_IMAGE"
docker tag "SOURCE_IMAGE" "DEST_IMAGE"
docker push "DEST_IMAGE"

GCS Hierarchical Namespace + Versioning Conflict

Symptom: Error: Versioning is not supported for hierarchical namespace buckets

Cause: GCS buckets with Hierarchical Namespace enabled cannot have versioning.

Fix: In the Terraform module, versioning is automatically disabled when HNS is enabled:

versioning = var.hierarchical_namespace != null && try(var.hierarchical_namespace.enabled, false) ? false : true

Firebase Auth Authorized Domains

Symptom: Frontend shows auth/unauthorized-domain error after deployment.

Cause: New hosting domains must be manually added to Firebase Auth settings.

Fix: Go to Firebase Console → Authentication → Settings → Authorized Domains → Add the new domain.

Note: Terraform cannot manage Firebase Auth Authorized Domains (provider limitation).

  • #323 - Main Terraform provisioning issue
  • #327 - ESPv2 stock image refactoring (future improvement)
  • #324 - Dev environment brownfield import
  • #326 - Test environment brownfield import