Skip to content

rockingham-homelab GCP project

Everything the lab needs from a cloud provider lives in one GCP project, rockingham-homelab. The project is created by terraform/gcp/ and holds, in a single bounded blast radius:

  • One Cloud DNS managed zone (lab.jackhall.dev). The apex jackhall.dev lives in Cloudflare (managed by terraform/cloudflare/) — see ADR-0003’s 2026-05-12 amendment for why.
  • The Secret Manager (GSM) containers that External Secrets Operator reads from.
  • Two IAM service accounts (cert-manager-dns01, external-secrets) the cluster impersonates from inside the cluster, and one plan-only CI SA (tf-ci-plan) GitHub Actions impersonates from outside.
  • A Workload Identity Federation pool that trusts GitHub OIDC tokens scoped to this repository.
  • The GCS bucket holding Terraform state for both roots.

terraform/gcp/ is the only place these resources are declared. terraform/bootstrap/ reads remote-state outputs from it (zone name, SA emails, project ID) but creates nothing on the cloud side; ArgoCD never touches GCP at all.

Project-level isolation is the only blast-radius boundary GCP enforces cheaply. Everything in rockingham-homelab can be torn down atomically with tofu destroy — the project resource has deletion_policy = "DELETE", which overrides the v6 provider default of PREVENT. A gcloud projects undelete <project_id> window covers the next 30 days if the destroy was accidental. This convention is captured in ADR-0001 and surfaces in CONTEXT.md as the rockingham-homelab entry.

Only the lab.jackhall.dev subzone lives in Cloud DNS. The apex jackhall.dev is on Cloudflare (managed by terraform/cloudflare/); the NS records that delegate lab.jackhall.dev to Cloud DNS live inside the CF apex zone, not here. See ADR-0003’s 2026-05-12 amendment for why the apex moved.

ZoneHoldsWhy it’s in Cloud DNS
lab.jackhall.devACME DNS-01 TXT records, written by cert-manager. No A/AAAA — internal hostnames are served by AdGuard Home in-cluster.cert-manager needs write access via a service account; that’s only possible against a zone in a provider Terraform owns. NS-delegated from the CF apex via a cloudflare_dns_record.lab_delegation set in terraform/cloudflare/.

The CAA record permitting Let’s Encrypt to issue for jackhall.dev and below lives at the apex too — see terraform/cloudflare/ and the jackhall.dev (apex) entry in CONTEXT.md.

The lab.jackhall.dev zone is the public half of the lab’s split-horizon DNS setup — it publishes only ACME challenges, never the LAN IPs of cluster services. LAN-side resolution is owned by AdGuard Home on 192.168.1.200.

Outputs to consume:

  • lab_zone_nameterraform/bootstrap/ passes it to cert-manager’s ClusterIssuer so the DNS-01 solver targets exactly this zone.
  • lab_zone_name_servers → consumed by terraform/cloudflare/ via terraform_remote_state to populate the lab.jackhall.dev NS records inside the CF apex zone.
  • apex_dns_name → consumed by terraform/cloudflare/ via terraform_remote_state to locate the CF apex zone.

GSM is the storage half of the cluster’s secrets pipeline. Every credential the lab needs at runtime lives here as a container declared by Terraform; the actual versions are uploaded out of band via gcloud secrets versions add. This split keeps the plaintext values out of Terraform state — pulling them through TF would land them in the GCS-backed state file (doubling the surface) and force every operator without a local copy to either plumb a CI env var or trigger a spurious “version will be destroyed” diff on every plan.

The containers currently provisioned:

GSM secret IDPurposeUploader
talos-cluster-secretsThe whole of talos/_out/secrets.yaml — cluster CA private key, etcd bootstrap token, friends. The one irreplaceable file in the lab.Operator, after talosctl gen secrets. See talos/README.md.
argocd-repo-ssh-keyPrivate half of the GitHub deploy key ArgoCD uses to clone this repo.Operator, after registering the public half as a read-only deploy key on the repo.
adguard-home-adminJSON {username, password_hash} for AdGuard Home’s admin user. password_hash is the bcrypt hash; the matching plaintext lives in the homepage-adguard-* pair below.Operator.
homepage-adguard-username / homepage-adguard-passwordPlaintext credentials for the Homepage AdGuard widget (the live-status API doesn’t accept the bcrypt hash). Must match the password in adguard-home-admin.Operator.
homepage-argocd-tokenArgoCD readonly API token for the Homepage ArgoCD widget.Operator.
arc-app-id / arc-app-private-keyGitHub App ID and PEM private key for the Rockingham Homelab ARC App that backs both runner scale sets.Operator, after creating/rotating the App.
arc-installation-id-raptgroup / arc-installation-id-brazostechPer-org installation IDs for the two ARC pools (raptgroup and brazostech). One App, two installations.Operator.

How cluster addons read these: every credential transits through External Secrets Operator. Each addon ships an ExternalSecret manifest under kubernetes/apps/<addon>/ that names the GSM secret by ID; ESO polls every hour, materialises the value as a Kubernetes Secret in the addon’s namespace, and the addon consumes it normally. The mapping is one-to-one: a GSM container above corresponds to exactly one ExternalSecret somewhere in the repo.

Rotating a credential is a one-shot gcloud secrets versions add <id> against the container — no Terraform run, no ArgoCD sync. ESO picks up the new version on its next refresh and the consuming addon picks it up on its next pod restart (or sooner, if the addon watches the Secret).

Three service accounts. Each is scoped tightly enough that compromise of its credentials does not give general access to the project.

Used by cert-manager’s DNS-01 solver to write _acme-challenge TXT records into the lab.jackhall.dev zone. The SA holds roles/dns.admin scoped to that one managed zone, not project-wide — there is no roles/dns.admin binding at the project level, so the SA cannot list, read, or modify the apex zone or any future zone. The matching SA key is minted by terraform/bootstrap/, uploaded to GSM, and synced into the cluster by ESO so the ClusterIssuer can read it as a normal Kubernetes Secret.

This is the “scope the SA, then name the zone explicitly” pattern: the ClusterIssuer in cert-manager names hostedZoneName: lab.jackhall.dev explicitly rather than letting cert-manager auto-discover, because the SA lacks the project-level dns.managedZones.list permission auto-discovery would call. See cert-manager for the full chain.

Used by ESO to read every GSM container above. Holds roles/secretmanager.secretAccessor at the project level — read-only, all secrets in the project. Project-scope is deliberate: each new container that comes online is automatically reachable by ESO without a per-secret IAM resource, which matters because containers are added opportunistically as addons land.

The SA key is minted by terraform/bootstrap/, written into one Kubernetes Secret (gcp-sm-credentials in the external-secrets namespace), and every other credential in the cluster flows through this single bootstrap. That property is load-bearing for the ADR-0001 boundary — see External Secrets Operator for the chicken-and-egg argument and the alternatives that were rejected.

Used by .github/workflows/terraform-plan.yml to run tofu plan on pull requests. Holds roles/viewer at the project level — enough to read state, the live resource graph, and the contents of the tfstate bucket; structurally insufficient to apply changes. tofu apply remains operator-only on a workstation with ADC. Impersonated via Workload Identity Federation (see below) rather than via a long-lived JSON key.

The CI plan workflow runs on GitHub-hosted runners and authenticates to GCP via OIDC instead of a JSON key. The pieces:

GitHub Actions runner
│ emits OIDC token w/ assertion.repository == "RaptGroup/homelab"
Workload Identity Pool: github-actions
│ provider "github" trusts token.actions.githubusercontent.com
│ attribute_condition rejects any other repository
principalSet → attribute.repository/RaptGroup/homelab
│ roles/iam.workloadIdentityUser → tf-ci-plan SA
tf-ci-plan@rockingham-homelab.iam.gserviceaccount.com (roles/viewer)

The lock is on the provider, not the pool: a pool can have many providers and the trust scope of each is enforced in the provider’s attribute_condition. Here the condition is assertion.repository == "RaptGroup/homelab", so an OIDC token issued for any other repo — even one in the same org — is rejected before it can reach the SA. There is no branch or actor restriction; any push, PR, or manual run from this repo can call terraform plan, but that is bounded by the viewer-only SA.

What CI is wired up to consume:

GitHub Actions repository variable / secretValue source
GCP_WIF_PROVIDER (variable)tofu output -raw ci_workload_identity_provider
GCP_CI_SA (variable)tofu output -raw ci_service_account_email
GCP_BILLING_ACCOUNT (secret)The same billing account ID used in terraform.tfvars

The third value is a secret rather than a variable because Terraform needs it to set the billing_account variable; the workflow refuses to plan without it. See terraform/gcp/README.md for the operator-side wire-up commands, and Automation / terraform-plan for the workflow side — what runs against this SA on each PR and how the bootstrap root’s plan handles having no live cluster to refresh against.

gs://rockingham-homelab-tfstate holds the Terraform state for both roots:

  • gs://rockingham-homelab-tfstate/terraform/gcp/ — this root’s state.
  • gs://rockingham-homelab-tfstate/terraform/bootstrap/ — the bootstrap root’s state.

Versioning is on (10 newer versions retained, older ones expire after 30 noncurrent-days), uniform bucket-level access is enforced, and public access is structurally prevented. The bucket location is a single region (US-CENTRAL1) — multi-region is overkill for a homelab and roughly doubles the storage cost.

The GCS backend uses object generations for native state locking — no separate lock table required (unlike S3 + DynamoDB). Concurrent plans from a workstation and CI serialize at the bucket level.

Bootstrap-from-scratch caveat: on the very first apply (or after a tofu destroy), the bucket doesn’t yet exist, so tofu init against the GCS backend can’t read anything and fails. The recipe is to temporarily comment out the backend "gcs" block in versions.tf, apply locally, then uncomment and tofu init -migrate-state to push the state into the bucket. See terraform/gcp/README.md for the full sequence.

terraform/gcp/README.md is the operator-facing run book — billing prerequisites, the manual gcloud services enable serviceusage.googleapis.com step on a fresh project, the registrar handoff after the first apply, and the teardown flow that reverses the bootstrap. The main.tf in that directory is short enough to read end-to-end; each resource carries an inline comment explaining why it’s shaped the way it is.