Skip to main content

Cloud Run Jobs Pattern

Overview

Cloud Run Jobs provide a separate, dedicated container execution environment designed for tasks that run to completion and are not tied to an HTTP request. This is the primary strategy for background task processing in PermitProof.

Architecture Design

The architecture separates request handling from long-running work:

  1. Client Request: The client makes a gRPC call to the Cloud Run Service
  2. Service Orchestration: The gRPC service creates a Firestore tracking document and triggers a Cloud Run Job
  3. Job Execution: A new container instance spins up independently with full CPU allocation
  4. Job Processing: The job runs with full CPU and updates Firestore for progress tracking
  5. UI Progress Tracking: The UI subscribes to Firestore for real-time progress

Benefits:

  • Full CPU allocation throughout execution
  • Scalable task processing with parallel instances
  • Complete isolation from request handler
  • Optimal for tasks exceeding 60 seconds

Implementation

Component Structure

The implementation requires two distinct Java applications:

  1. gRPC Service - Request handler and job orchestrator
  2. Job Worker - Background task processor

Java Job Worker Application

The job worker is a standard Java application packaged as an executable JAR. It does not run a web server—it only requires a main method to execute its work.

CodeApplicabilityProcessorJobMain.java (example pattern)

public class CodeApplicabilityProcessorJobMain {

public static void main(String[] args) throws Exception {
System.out.println("Cloud Run Job started!");

// 1. Read arguments passed to the job
if (args.length < 4) {
System.err.println("Error: Missing required arguments");
System.exit(1);
}
String taskId = args[0];
String projectId = args[1];
String pageNumber = args[2];
String iccBookId = args[3];

System.out.println("Processing task: " + taskId);

// 2. Initialize services (Firestore, etc.)
TaskServiceImpl taskService = new TaskServiceImpl();

// 3. Perform the long-running work
try {
taskService.updateTaskProgress(taskId, "processing", 10, "Starting analysis...");

// Execute actual work...
CodeApplicabilityTaskExecutor.ExecutionResult result =
CodeApplicabilityTaskExecutor.executeCodeApplicabilityAnalysis(
taskId, projectId, Integer.parseInt(pageNumber),
iccBookId, null, null, null, null, null, taskService);

// 4. Mark as complete
if (result.success) {
System.out.println("Job finished successfully!");
System.exit(0);
} else {
System.err.println("Job failed: " + result.message);
System.exit(1);
}

} catch (Exception e) {
taskService.updateTaskFailed(taskId, "Job failed: " + e.getMessage());
System.err.println("Job failed: " + e.getMessage());
System.exit(1);
}
}
}

Dockerfile for the Job:

src/main/docker/Dockerfile

# Use a base image with Java
FROM openjdk:17-slim

# Set the working directory
WORKDIR /app

# Copy the compiled JAR file from your build process
COPY target/code-applicability-processor.jar app.jar

# The command to run when the container starts
# The args from the job execution will be appended here
ENTRYPOINT ["java", "-jar", "app.jar"]

Job Execution

Argument Passing and Job Initiation

Jobs are initiated from the gRPC service using the CloudRunTaskTrigger class. Arguments are passed as a list of strings.

Maven Dependency Configuration

Add this to your gRPC service's pom.xml:

<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-run</artifactId>
<version>0.15.0</version>
</dependency>

gRPC Service Orchestration

The gRPC service implementation executes jobs using CloudRunTaskTrigger:

public class CodeApplicabilityServiceImpl extends CodeApplicabilityServiceGrpc {

private CloudRunTaskTrigger jobTrigger;

public CodeApplicabilityServiceImpl() {
// Initialize Cloud Run Job trigger
String env = getEnvironmentSuffix();
String projectId = System.getenv("GCP_PROJECT_ID");
String region = System.getenv("GCP_LOCATION");
this.jobTrigger = new CloudRunTaskTrigger(
projectId, region, "code-applicability-processor-" + env);
}

@Override
public void startAsyncCodeApplicabilityAnalysis(
StartCodeApplicabilityAnalysisRequest request,
StreamObserver<StartCodeApplicabilityAnalysisResponse> responseObserver) {

// Create task in Firestore
String taskId = taskService.createTask(
"code-applicability",
JsonFormat.printer().print(request),
currentUserEmail,
null
);

// Trigger Cloud Run Job
String[] jobArgs = {
taskId,
request.getArchitecturalProjectId(),
String.valueOf(request.getPageNumber()),
request.getIccBookId()
};

logger.info("🚀 Executing Cloud Run Job with args: " + String.join(", ", jobArgs));
String executionName = jobTrigger.triggerJob(taskId, Arrays.asList(jobArgs));

// Return response immediately
StartCodeApplicabilityAnalysisResponse response =
StartCodeApplicabilityAnalysisResponse.newBuilder()
.setTaskId(taskId)
.setSuccess(true)
.build();
responseObserver.onNext(response);
responseObserver.onCompleted();
}
}

The triggerJob() method passes arguments which become elements in the String[] args array in the Job's main method.

Task Parallelism

Execution Model

Cloud Run Jobs do not use a traditional queue-based worker pool model. Each job execution is the task itself. There is no persistent pool of workers pulling from a queue like RabbitMQ or Pub/Sub.

Parallel Task Processing

Cloud Run Jobs provide parallelism through the --tasks flag:

  • Task Count: Specify the number of parallel container instances (e.g., gcloud run jobs execute my-job --tasks=50)
  • Concurrent Execution: All 50 instances start simultaneously
  • Task Indexing: Each instance receives a unique CLOUD_RUN_TASK_INDEX environment variable (values 0 to 49)

Work Distribution Pattern

The task index can be used to partition work across parallel instances:

String taskIndex = System.getenv("CLOUD_RUN_TASK_INDEX");
int index = Integer.parseInt(taskIndex);

// Example: Process 1,000 users across 50 tasks
// Task 0: users 1-20
// Task 1: users 21-40
// ...
int usersPerTask = 20;
int startUser = (index * usersPerTask) + 1;
int endUser = startUser + usersPerTask - 1;

This approach achieves large-scale parallelism without external queue management infrastructure.

Deployment Configuration

Environment Variables

Cloud Run Job requires these environment variables:

  • GCP_PROJECT_ID: Google Cloud Project ID
  • GCP_LOCATION: Cloud Run region (e.g., us-central1)
  • GOOGLE_APPLICATION_CREDENTIALS: For Firestore access

Job Naming Convention

Jobs are named with environment suffix for isolation:

  • code-applicability-processor-{env} (Code Applicability Analysis)
  • plan-ingestion-processor-{env} (Architectural Plan Ingestion)
  • compliance-report-processor-{env} (Compliance Report Generation)

Where {env} is one of: dev, demo, prod

Resource Configuration

Configure appropriate resources in Cloud Run Job definition:

  • Memory: 2GB minimum for code applicability tasks
  • CPU: 2 vCPU for optimal performance
  • Timeout: 30 minutes for complex analyses
  • Max Retries: 2 for transient failures

Monitoring and Debugging

Cloud Logging

Monitor job execution via Cloud Logging:

gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=code-applicability-processor-dev" --limit 50

Task Status Tracking

Query task status via Firestore or gRPC:

import org.codetricks.construction.code.assistant.service.GetTaskStatusRequest;
import org.codetricks.construction.code.assistant.service.GetTaskStatusResponse;

GetTaskStatusRequest request = GetTaskStatusRequest.newBuilder()
.setTaskId(taskId)
.build();
GetTaskStatusResponse response = taskServiceStub.getTaskStatus(request);

Common Issues

  1. Job Timeout: Increase timeout in job configuration
  2. Permission Errors: Verify service account has required IAM roles
  3. Container Startup Failures: Check Docker image and entrypoint configuration
  4. Firestore Access: Ensure service account has roles/datastore.user

Best Practices

  1. Use for Long Tasks: Reserve Cloud Run Jobs for tasks exceeding 60 seconds
  2. Idempotent Design: Jobs may be retried, ensure operations are idempotent
  3. Graceful Shutdown: Handle SIGTERM signals for clean termination
  4. Resource Cleanup: Release resources properly in finally blocks
  5. Comprehensive Logging: Log progress for debugging and monitoring