GCP Billing Kill Switch with Terraform (Stop Cloud Costs Fast)

Stop unexpected GCP bills before they explode. Learn how to build a Terraform-powered billing kill switch that automatically disables cloud spending when limits are crossed.

Table of Contents

Introduction

Last month, one of our dev teams spun up multiple GKE clusters on GCP for a client proof-of-concept. Standard stuff — a few nodes, some test workloads, a Cloud SQL instance for the backend. The PoC wrapped up but the clusters didn't get torn down.

By the time anyone noticed it, the bill had crossed $1200. The budget alert email landed in a shared inbox that nobody checks.

This wasn't a one-off. We run several GCP dev and sandbox projects across our customers. Developers spin up resources to test infrastructure patterns, try out new GCP services, or build demos. Most of the time they clean up. Sometimes they don't. And GCP's built-in budget alerts — while useful — only send notifications. They don't actually stop anything.

We needed something that would act, not just alert. A hard spending cap that would automatically disable billing on a project when costs crossed a threshold. We needed a kill switch that would cut the billing. Yes, disabling billing is aggressive — it stops all paid services cold (Compute, GKE, Cloud SQL, Cloud Run, everything). But for dev and sandbox environments, a temporarily broken project is far better than a surprise bill.

So we built one. A Cloud Function that listens for billing budget notifications via Pub/Sub and calls the Cloud Billing API to detach the billing account when spend exceeds a configurable threshold. The entire thing is managed with Terraform, so we can stamp it across every dev project in minutes.

Here's how it works, and how you can set it up for your own GCP projects.

Architecture

The system has four moving parts: a Cloud Billing Budget that monitors spend, a Pub/Sub topic that carries budget notifications, an Eventarc trigger that routes messages to the function, and a Cloud Function (Gen 2) that decides whether to pull the plug.

GCP Billing Kill Switch Architecture (Visual Flow)

The flow is straightforward: GCP's billing system publishes a JSON notification to the Pub/Sub topic every time a budget threshold is crossed. The Cloud Function receives this message, compares the reported costAmount against budgetAmount * kill_switch_threshold, and if spend is at or above the threshold, it calls cloudbilling.projects.updateBillingInfo with an empty billing account name — which is GCP's documented way to detach billing from a project.

Once billing is detached, all paid services in the project stop. GKE nodes shut down, Cloud SQL instances become inaccessible, Compute VMs are suspended. The project itself remains intact — no data is deleted — but nothing runs until billing is manually re-enabled.

Prerequisites

  • Terraform >= 1.9
  • gcloud CLI authenticated
  • roles/billing.admin on the billing account (for the Terraform executor)
  • roles/owner or roles/editor on the target GCP project

Step-by-Step Implementation

1. Enable the required APIs

The kill switch depends on several GCP services. Terraform handles this, but it's worth knowing what's being enabled:

resource "google_project_service" "apis" {
  for_each = toset([
    "cloudbilling.googleapis.com",
    "cloudfunctions.googleapis.com",
    "cloudbuild.googleapis.com",
    "pubsub.googleapis.com",
    "eventarc.googleapis.com",
    "run.googleapis.com",
    "artifactregistry.googleapis.com",
    "storage.googleapis.com",
    "billingbudgets.googleapis.com",
  ])

  project            = var.gcp_project_id
  service            = each.key
  disable_on_destroy = false
}

Setting disable_on_destroy = false prevents terraform destroy from disabling APIs that other resources in the project might depend on.

2. Create the Pub/Sub topic and billing budget

The Cloud Billing Budget publishes a JSON message to a Pub/Sub topic every time a threshold is crossed. GCP automatically grants the billing service account pubsub.publisher on the topic — no manual IAM binding needed.

resource "google_pubsub_topic" "budget_alerts" {
  name    = "${var.project}-${var.environment}-budget-alerts"
  project = var.gcp_project_id
}

resource "google_billing_budget" "kill_switch" {
  billing_account = var.billing_account_id
  display_name    = "${var.project}-${var.environment} Kill Switch Budget"

  budget_filter {
    projects = ["projects/${data.google_project.this.number}"]
  }

  amount {
    specified_amount {
      currency_code = var.budget_currency_code
      units         = tostring(var.budget_amount)
    }
  }

  dynamic "threshold_rules" {
    for_each = toset(concat(
      [for t in var.alert_thresholds : tostring(t)],
      [tostring(var.kill_switch_threshold)]
    ))
    content {
      threshold_percent = tonumber(threshold_rules.value)
      spend_basis       = "CURRENT_SPEND"
    }
  }

  all_updates_rule {
    pubsub_topic   = google_pubsub_topic.budget_alerts.id
    schema_version = "1.0"
  }
}

A few things worth noting:

  • The dynamic "threshold_rules" block merges alert thresholds (notification-only) with the kill switch threshold into a deduplicated set. The budget API rejects duplicate threshold_percent values, so toset() handles that.
  • spend_basis = "CURRENT_SPEND" tracks actual spend, not forecasted spend. Forecasted budgets can trigger prematurely.

3. Create the service account with billing permissions

The Cloud Function needs a service account with roles/billing.admin on the billing account — not just the project. This is because disabling billing requires calling cloudbilling.projects.updateBillingInfo, which is a billing-account-level permission.

resource "google_service_account" "kill_switch" {
  account_id   = "${var.project}-${var.environment}-bks"
  display_name = "Billing Kill Switch"
  project      = var.gcp_project_id
}

resource "google_billing_account_iam_member" "kill_switch_billing" {
  billing_account_id = var.billing_account_id
  role               = "roles/billing.admin"
  member             = "serviceAccount:${google_service_account.kill_switch.email}"
}

The function SA also needs roles/run.invoker and roles/eventarc.eventReceiver on the project so Eventarc can route Pub/Sub messages to it. Additionally, the GCP-managed Pub/Sub service account needs roles/iam.serviceAccountTokenCreator to mint OIDC tokens for authenticated push delivery:

resource "google_project_iam_member" "pubsub_token_creator" {
  project = var.gcp_project_id
  role    = "roles/iam.serviceAccountTokenCreator"
  member  = "serviceAccount:service-${data.google_project.this.number}@gcp-sa-pubsub.iam.gserviceaccount.com"
}

This one is easy to miss and will silently break the trigger if omitted.

4. Deploy the Cloud Function

The function source is a Python script packaged as a zip and uploaded to a GCS bucket. Terraform's archive_file data source handles the zipping, and the object name includes the MD5 hash so Cloud Functions picks up changes on redeploy.

data "archive_file" "function_source" {
  type        = "zip"
  source_dir  = "${path.module}/function"
  output_path = "${path.module}/.terraform/function-source.zip"
}

resource "google_cloudfunctions2_function" "kill_switch" {
  name     = "${var.project}-${var.environment}-billing-kill-switch"
  location = var.gcp_region
  project  = var.gcp_project_id

  build_config {
    runtime     = "python312"
    entry_point = "kill_billing"

    source {
      storage_source {
        bucket = google_storage_bucket.function_source.name
        object = google_storage_bucket_object.function_source.name
      }
    }
  }

  service_config {
    max_instance_count    = 1
    available_memory      = "256M"
    timeout_seconds       = 60
    service_account_email = google_service_account.kill_switch.email

    environment_variables = {
      GCP_PROJECT_ID        = var.gcp_project_id
      KILL_SWITCH_THRESHOLD = tostring(var.kill_switch_threshold)
      DRY_RUN               = "false"
    }
  }

  event_trigger {
    event_type            = "google.cloud.pubsub.topic.v1.messagePublished"
    pubsub_topic          = google_pubsub_topic.budget_alerts.id
    retry_policy          = "RETRY_POLICY_DO_NOT_RETRY"
    service_account_email = google_service_account.kill_switch.email
  }
}

Key design choices:

  • max_instance_count = 1 — there's no benefit to parallel execution; billing can only be disabled once.
  • RETRY_POLICY_DO_NOT_RETRY — if the function fails, retrying could cause a loop. Better to alert and investigate.
  • DRY_RUN — set to "true" in staging so you can validate the pipeline without actually killing billing.

5. Write the Cloud Function

The Python function receives the Pub/Sub budget notification via Eventarc, compares the reported cost against the threshold, and calls the Cloud Billing API to detach the billing account if the threshold is exceeded.

@functions_framework.cloud_event
def kill_billing(cloud_event: CloudEvent) -> None:
    raw = base64.b64decode(cloud_event.data["message"]["data"]).decode("utf-8")
    notification = json.loads(raw)

    cost_amount = float(notification.get("costAmount", 0))
    budget_amount = float(notification.get("budgetAmount", 0))
    threshold = float(os.environ.get("KILL_SWITCH_THRESHOLD", "1.0"))
    project_id = os.environ.get("GCP_PROJECT_ID")

    kill_at = budget_amount * threshold

    if cost_amount < kill_at:
        logger.info("Spend %.2f is below threshold %.2f — no action.", cost_amount, kill_at)
        return

    logger.warning("KILL SWITCH TRIGGERED: cost=%.2f >= kill_at=%.2f", cost_amount, kill_at)

    # Detach billing account — passing empty string disables billing
    client = billing_v1.CloudBillingClient()
    client.update_project_billing_info(
        name=f"projects/{project_id}",
        project_billing_info=billing_v1.ProjectBillingInfo(billing_account_name=""),
    )

The function follows GCP's official pattern for programmatically managing billing. A few important behaviors:

  • GCP sends budget notifications multiple times daily, even when spend is $0. The function ignores these by comparing against the kill threshold.
  • The function fails open on permission errors when checking billing status — if it can't verify, it assumes billing is enabled and attempts to disable it anyway. This matches Google's recommended fail-safe pattern.
  • The requirements.txt needs google-cloud-billing, functions-framework, google-cloud-logging, and cloudevents.

6. Deploy and test

cp terraform.tfvars.example terraform.tfvars
# Set billing_account_id, gcp_project_id, budget_amount

terraform init
terraform plan
terraform apply

To test safely, set DRY_RUN = "true" in the environment variables and set a low budget_amount (e.g., $1). The function will log what it would do without actually disabling billing. Check logs in Cloud Logging:

gcloud functions logs read YOUR_FUNCTION_NAME --region us-central1 --gen2

Re-enabling billing

Billing can only be re-enabled manually. This is by design — an automated re-enable would defeat the purpose of the kill switch and could create a disable/enable loop.

gcloud billing projects link YOUR_PROJECT_ID \
  --billing-account=XXXXXX-XXXXXX-XXXXXX

What we learned after deploying this

We've been running this kill switch across our dev and sandbox GCP projects for a few months now. A few observations:

The dry-run mode is essential for testing. We initially deployed with DRY_RUN = "true" on all projects to validate the pipeline end-to-end. This caught an IAM permission issue — the Pub/Sub token creator binding was missing, and Eventarc was silently dropping messages. Without dry-run, we wouldn't have discovered this until a real incident.

Budget notifications are noisy. GCP sends budget alerts multiple times a day, even when spend is $0. The function handles this gracefully (it just compares and exits), but if you're watching Cloud Logging, expect to see a lot of "below threshold — no action" entries. This is normal.

Don't use this on production. This should go without saying, but disabling billing is a scorched-earth response. For production projects, use the alert thresholds for Slack/email notifications and set up proper cost anomaly detection instead. The kill switch is for environments where availability doesn't matter — dev, sandbox, PoC, demo projects.

Conclusion

GCP doesn't offer a native hard spending cap. Budget alerts notify you, but they don't act. This kill switch fills that gap for environments where runaway costs are a bigger risk than temporary downtime.

The entire stack is about 200 lines of Terraform and 50 lines of Python. You can stamp it across every dev project in your organization by changing a few tfvars.

If you're managing multiple GCP projects and want to set up spending controls, cost alerting, or broader FinOps automation, we'd be happy to help.

If you're exploring more DevOps, Kubernetes, and Terraform use cases, check out these detailed guides:

FAQs

Does GCP have a spending limit?

No, GCP only provides alerts and does not stop billing automatically.

What happens when billing is disabled?

All paid services stop, but your data remains safe.

Can I use this in production?

No, it is recommended only for dev and test environments.

Why use Terraform for this?

Terraform allows you to automate and replicate this setup across multiple projects.