What is the minimum test coverage required before establishing CI/CD enforcement?

30% overall, with 60%+ for business logic modules. Below 30%, the coverage threshold would fail the build on the first commit, blocking development. 30% is the minimum at which automated regression detection becomes reliable enough to enforce.

How do preservation markers interact with AI tools like Cursor?

At two levels: soft enforcement via .cursorrules instructs the AI tool to respect the markers; hard enforcement via CI/CD preservation check fails the build if a protected region was modified, regardless of whether the AI tool respected the instruction.

How long does it take to establish the full production safety layer?

For a 20k–50k LOC codebase with no existing tests: 5–8 days total. Test baseline (Priority 1): 3–5 days. CI/CD configuration: 1–2 hours. Preservation markers: 1–2 hours. Rollback setup and first drill: 2 hours.

Production Safety Layer: Making AI-Generated Code Safe to Deploy

The Production Safety Layer is a structural technique for AI-generated codebases that establishes the automated feedback loop between code changes and correctness verification. It consists of three elements: a test baseline that covers the highest-risk code paths, a CI/CD pipeline that enforces the baseline on every commit, and a rollback mechanism that can revert a production deployment in under five minutes.

Without a production safety layer, every deployment from an AI-generated codebase is a manual verification exercise. Regressions reach production. Regeneration losses are discovered by users. The cost of every change includes an invisible manual QA tax that grows with the codebase. The production safety layer eliminates that tax by making correctness verification automatic.

This page explains the three elements of the production safety layer, how to implement each one, and what the expected structural outcome looks like.

What the Production Safety Layer Solves

The production safety layer directly addresses the structural conditions that make AI-generated codebases unsafe to deploy at velocity:

No automated regression detection — changes that break existing behavior reach production because no test exists for the affected code path
False pipeline confidence — a CI/CD pipeline that runs only a build step reports green on every commit while providing no protection against regressions or structural violations
Regeneration losses in production — prompt-driven regeneration overwrites custom logic; without a preservation check, the loss is deployed and discovered by users
Slow, expensive rollbacks — when a production incident occurs, the rollback takes 30–60 minutes because the procedure has never been practiced and the tooling is not configured for fast recovery

The production safety layer does not prevent all bugs. It prevents the specific class of failures that are structurally inevitable in AI-generated codebases developed without automated verification.

Element 1: The Test Baseline

The test baseline is the minimum test coverage required to make the codebase safe to deploy continuously. It is not 100% coverage — it is coverage of the highest-risk code paths, established in a dedicated sprint before any further structural work.

Priority Order

Priority 1 — Business logic (write first, highest risk):
  ✓ Pricing and discount calculations
  ✓ Permission and authorization checks
  ✓ Data validation logic (email, phone, date ranges, amounts)
  ✓ State transition logic (order status, payment flow, subscription changes)
  ✓ Any calculation that affects money, access, or data integrity

Priority 2 — Integration boundaries (medium risk):
  ✓ API endpoint contracts — input validation, response shape, error codes
  ✓ Repository layer — query correctness against a test database
  ✓ External service adapters — mocked, verifying the adapter contract

Priority 3 — UI behavior (lower risk for backend-heavy systems):
  ✓ Critical user flows — login, checkout, account creation
  ✓ Form validation feedback

Testable Code Pattern

The test baseline requires that the highest-risk modules are architecturally testable — that they can be instantiated without loading the full application stack. This requires dependency injection:

# ✅ Testable: dependency injected, no direct DB access
class CalculateOverageService:
    def __init__(self, repo: CalculateOverageRepository) -> None:
        self.repo = repo

    def execute(self, request: CalculateOverageRequest) -> CalculateOverageResponse:
        credits = self.repo.get_credits(request.user_id)
        overage = max(0, int(credits) - 100) * 0.005
        return CalculateOverageResponse(amount=round(overage, 2))

# Test — no database required
class TestCalculateOverageService:
    def test_pro_user_no_overage(self):
        repo = MockRepository(credits="100")
        service = CalculateOverageService(repo)
        result = service.execute(CalculateOverageRequest(user_id="u1"))
        assert result.amount == 0.0

    def test_overage_calculated_correctly(self):
        repo = MockRepository(credits="150")
        service = CalculateOverageService(repo)
        result = service.execute(CalculateOverageRequest(user_id="u1"))
        assert result.amount == 0.25  # (150 - 100) * 0.005

// ✅ Testable: dependency injected
export class CalculateOverageService {
  constructor(private repo: CalculateOverageRepository) {}

  execute(request: CalculateOverageRequest): CalculateOverageResponse {
    const credits = this.repo.getCredits(request.userId)
    const overage = Math.max(0, parseInt(credits) - 100) * 0.005
    return { amount: Math.round(overage * 100) / 100 }
  }
}

// Test — no database required
describe('CalculateOverageService', () => {
  it('returns zero when credits at limit', () => {
    const repo = { getCredits: () => '100' }
    const service = new CalculateOverageService(repo)
    expect(service.execute({ userId: 'u1' }).amount).toBe(0)
  })

  it('calculates overage correctly', () => {
    const repo = { getCredits: () => '150' }
    const service = new CalculateOverageService(repo)
    expect(service.execute({ userId: 'u1' }).amount).toBe(0.25)
  })
})

Element 2: CI/CD Enforcement

The CI/CD enforcement layer makes the test baseline durable — it ensures that coverage cannot drop below the established threshold and that structural violations cannot merge.

# .github/workflows/ci.yml — production safety layer enforcement
name: Production Safety Layer

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  safety:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci  # or: pip install -r requirements.txt

      # Layer 1: Structural integrity (fastest — fail early)
      - name: Boundary linter
        run: npx depcruise --config .dependency-cruiser.js src/
        # Fails if cross-layer imports or circular deps detected

      - name: Type check
        run: npx tsc --noEmit  # or: mypy .

      # Layer 2: Test enforcement
      - name: Run tests with coverage threshold
        run: |
          npx jest --coverage \
            --coverageThreshold='{"global":{"lines":30},"./src/domains/":{"lines":60}}'
          # or: pytest --cov=. --cov-fail-under=30

      # Layer 3: Preservation integrity
      - name: Check preservation markers
        run: python scripts/check_preservation_markers.py
        # Fails if protected regions were modified by regeneration

      # Layer 4: Build verification
      - name: Build
        run: npm run build  # or: python -m build

Branch protection configuration:

GitHub → Settings → Branches → Branch protection rules → main:
  ✓ Require status checks to pass before merging
  ✓ Require branches to be up to date before merging
  ✓ Status checks required: Production Safety Layer / safety
  ✓ Do not allow bypassing the above settings

This configuration makes it structurally impossible to merge a PR that fails any safety check — regardless of who authored the change or whether the reviewer approved it.

Element 3: Rollback Mechanism

The rollback mechanism is the recovery layer — the ability to revert a production deployment in under five minutes when an incident occurs.

Minimum Viable Rollback

The minimum viable rollback mechanism requires no blue/green infrastructure. It requires three things: deployment tags, a documented procedure, and a practiced drill.

Step 1: Deployment tagging

# Tag every production deployment before it goes live
git tag -a "deploy-$(date +%Y%m%d-%H%M)" -m "Production deployment — $(git log -1 --format='%s')"
git push origin --tags

# List recent deployment tags
git tag --list "deploy-*" | sort | tail -10

Step 2: Rollback procedure (documented)

# ROLLBACK PROCEDURE
# Time target: under 5 minutes from incident detection to stable production

# 1. Identify the last stable deployment tag
git tag --list "deploy-*" | sort | tail -5
# Example output:
#   deploy-20260219-0830  ← current (broken)
#   deploy-20260218-1445  ← last stable

# 2. Create a revert commit (never force-push to main)
git revert HEAD..deploy-20260218-1445 --no-commit
git commit -m "revert: rollback to deploy-20260218-1445 — [incident description]"

# 3. Push and deploy using the normal deployment process
git push origin main
# Deployment triggers automatically (Vercel/Railway/etc.) or manually

# 4. Verify stability
# Run smoke tests against production
# Confirm the incident is resolved

Step 3: Rollback drill

The rollback drill is the critical component. A procedure that has never been practiced takes 30–60 minutes under incident pressure. A practiced procedure takes 2–3 minutes.

Rollback drill schedule:
  First drill: before the first production launch
  Subsequent drills: quarterly, or after any significant deployment process change

Drill procedure:
  1. Deploy a known-bad commit to a staging environment
  2. Execute the rollback procedure against staging
  3. Verify the rollback restored the previous state
  4. Measure the time from "incident detected" to "stable state confirmed"
  5. Document the result and update the procedure if needed

What Changes After the Production Safety Layer Is Established

Before the production safety layer, every deployment carries unquantified risk. After it is established:

Regressions are caught before deployment — the test suite runs on every commit; a regression fails the build before it reaches production
Coverage cannot silently degrade — the coverage threshold enforcement fails the build if coverage drops below the baseline
Regeneration losses cannot reach production — the preservation marker check catches overwritten custom logic before the commit merges
Structural violations cannot accumulate — the boundary linter blocks cross-layer imports and circular dependencies at the merge gate
Production incidents are recoverable in minutes — the rollback procedure is documented, practiced, and executable in under 5 minutes

The production safety layer does not eliminate all production incidents. It eliminates the class of incidents that are structurally preventable — regressions, regeneration losses, and structural violations that reach production because no automated check existed to catch them.

Implementation Sequence

Step 1 (2–4 hours): Establish architectural testability
  → Identify the highest-risk modules (pricing, auth, validation)
  → Verify they use dependency injection (not direct DB access)
  → If not: extract to service layer first (see Boundary Enforcement)

Step 2 (3–5 days): Write the test baseline
  → Priority 1: business logic tests (pricing, auth, validation)
  → Priority 2: API contract tests (input/output shapes)
  → Target: 30% overall, 60%+ for business logic modules
  → Do not write tests for UI components or utilities first

Step 3 (1–2 hours): Configure CI/CD enforcement
  → Add the workflow above to .github/workflows/ci.yml
  → Set coverage thresholds in jest.config.js or pytest config
  → Configure branch protection

Step 4 (1 hour): Add preservation markers
  → Identify files with custom logic that must not be regenerated
  → Add === BEGIN USER CODE === / === END USER CODE === markers
  → Add preservation check script to CI/CD

Step 5 (2 hours): Establish rollback mechanism
  → Add deployment tagging to the deployment process
  → Document the rollback procedure in RUNBOOK.md
  → Schedule and execute the first rollback drill