Introduction to Workload Identity
For a GCP Professional Cloud Security Engineer (PSE), managing secrets is one of the highest-risk activities. Traditional application authentication relied on downloading JSON keys for Service Accounts and mounting them as "secrets" into containers. This was a security nightmare: keys were often checked into Git, leaked in logs, or stolen from compromised pods.
Workload Identity (for GKE) and Workload Identity Federation (for external clouds) represent the "Gold Standard" of modern security. They allow your applications to authenticate to Google Cloud services using their existing identity (K8s Service Account, AWS IAM Role, etc.) without ever needing a static, long-lived secret key.
白話文解釋(Plain English Explanation)
1. The Automatic Badge Maker (Workload Identity)
Imagine a high-security lab. Instead of giving every researcher a permanent physical key, there is an "Automatic Badge Maker" at the entrance. When a researcher (a GKE Pod) approaches, the machine verifies their lab coat and ID (K8s Service Account) and spits out a temporary, 1-hour badge. When the badge expires, it's useless. This is the GKE Metadata Server in action.
2. The Currency Exchange (Workload Identity Federation)
Imagine you are traveling from the UK to the USA. You can't spend Pounds in New York, but you can go to a Currency Exchange. You show your valid Pounds (AWS Token), and the clerk gives you an equivalent amount of Dollars (Google Access Token) based on a pre-agreed exchange rate (Attribute Mapping). You never needed a "Universal Credit Card" (JSON Key).
3. The Trusted Envoy (OIDC Federation)
Think of a peace treaty between two kingdoms. Kingdom A doesn't know the soldiers of Kingdom B, but they trust Kingdom B's King. If a soldier carries a letter signed by Kingdom B's King (OIDC Token), Kingdom A treats them as a guest. Google Cloud trusts the "Issuer" (AWS/Azure/Okta) to say who the workload is.
Workload Identity for GKE
This is the recommended way for applications running on GKE to access Google Cloud services.
How it Works
- You create a Google Service Account (GSA) with the required IAM roles.
- You create a Kubernetes Service Account (KSA) in your GKE cluster.
- You "bind" the two together by granting the GSA the
roles/iam.workloadIdentityUserrole for the KSA. - The GKE Metadata Server intercepts requests from the pod and exchanges the KSA's token for a GSA access token.
Workload Identity is the GKE feature that bridges the gap between Kubernetes identities and Google Cloud IAM identities, eliminating the need for managing Service Account keys within clusters.
Workload Identity Federation for External Workloads
This extends the "keyless" concept to workloads running outside of Google Cloud (AWS, Azure, GitHub Actions, On-prem).
Key Components
- Workload Identity Pool: A container for external identities (e.g., "AWS-Production-Pool").
- Workload Identity Provider: Defines the relationship between Google and the external IdP (e.g., AWS, Azure, or OIDC/SAML).
- Attribute Mapping: Maps external claims (e.g.,
aws:PrincipalArn) to Google-internal attributes.
Federation for AWS/Azure
Instead of storing Google JSON keys in AWS Secrets Manager, your AWS Lambda or EC2 instance uses its native Instance Profile to request a Google token.
Workload Identity Federation is the only PSE-recommended way to authenticate GitHub Actions to Google Cloud for CI/CD pipelines.
Mapping K8s Service Accounts to GSAs
The binding is a two-way street:
- K8s Annotation: The KSA must be annotated with the GSA email.
- IAM Binding: The GSA must have a policy binding that allows the specific KSA (identified by namespace and name) to act as it.
Example K8s Annotation
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
iam.gke.io/gcp-service-account: [email protected]
name: my-app-ksa
namespace: default
The KSA-to-GSA bind is two-sided and PSE scenarios frequently test this. The iam.gke.io/gcp-service-account annotation on the KSA alone is not sufficient — you must also grant roles/iam.workloadIdentityUser on the GSA with member serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]. Missing either half produces a silent 403 from the GKE Metadata Server.
Eliminating the Need for Static JSON Keys
Why is this a PSE requirement?
- Rotation: Short-lived tokens rotate automatically every hour.
- Revocation: If a pod is compromised, deleting the KSA or the IAM binding immediately revokes access.
- Compliance: Audits are cleaner because there are no "secret files" to track.
Attribute Mapping and Conditions
In Federation, you don't always want to trust every workload from your AWS account.
Mapping Claims
You map an incoming AWS claim like arn:aws:sts::123456789012:assumed-role/MyRole to a Google attribute like google.subject.
Conditional Access
You can add a CEL condition to the Identity Provider:
assertion.arn.startsWith('arn:aws:iam::123456789012:role/Production')
Result: Only AWS workloads with the 'Production' role can exchange tokens.
Tighten the Workload Identity Provider with a CEL attribute_condition (e.g., assertion.arn.startsWith('arn:aws:iam::123456789012:role/Production') or assertion.repository == 'my-org/my-repo' for GitHub Actions) before STS will even mint a federated token. This narrows the blast radius far earlier than IAM-policy-only filtering and keeps unwanted AWS roles or forked repos from ever reaching google.subject.
GKE Metadata Server Protection
The Metadata Server is the "heart" of Workload Identity on GKE.
- GKE Sandbox: For high-security environments, use GKE Sandbox to further isolate the metadata server from the pod's kernel.
- Metadata Concealment: In older GKE versions, this was used to hide the default GCE metadata from pods; Workload Identity replaces and improves upon this.
If you don't enable Workload Identity, pods might fallback to using the Node's default Service Account, which often has broad 'Editor' permissions. This is a common PSE exam trap.
Troubleshooting Identity Token Exchange
- Check Annotations: Is the KSA correctly annotated?
- Verify IAM Binding: Does the GSA have
roles/iam.workloadIdentityUserfor the KSA? - STS Logs: For federation, check the Security Token Service (STS) logs to see if the external token was rejected.
- Token Format: Ensure the OIDC provider is sending a valid JWT (JSON Web Token).
Best Practices for Multi-Cloud Workload Identity
- One Pool per Environment: Use separate Identity Pools for
Dev,Stage, andProd. - Least Privilege Mapping: Only map the attributes you actually need for IAM decisions.
- Use Short-Lived Tokens: Set the
expiresInparameter to the minimum required for the task. - Monitor Token Exchange: Audit the
iam.googleapis.com/WorkloadIdentityPoolresource usage.
CLI Commands for Workload Identity
Creating a Workload Identity Pool
gcloud iam workload-identity-pools create "my-aws-pool" \
--location="global" \
--display-name="AWS Production Pool"
Adding an AWS Provider
gcloud iam workload-identity-pools providers create-aws "my-aws-provider" \
--workload-identity-pool="my-aws-pool" \
--account-id="123456789012" \
--location="global"
Granting the KSA permission to act as GSA
gcloud iam service-accounts add-iam-policy-binding \
[email protected] \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:my-project.svc.id.goog[my-namespace/my-ksa]"
The member format for GKE Workload Identity is:
serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]
Security Best Practices for PSE
- Disable Node Metadata: Ensure
workload-metadata-configuration=GKE_METADATAis set on all node pools. - Use Managed Identities: In Azure, use "System-assigned Managed Identity" for federation.
- Audit Token Usage: Periodically check for successful vs. failed
GenerateAccessTokencalls. - Namespace Isolation: Never share KSAs across namespaces if those pods require different GSA permissions.
Troubleshooting Scenarios
Scenario: Pod gets "403 Forbidden" when calling BigQuery.
Diagnosis: The KSA is annotated, but the GSA lacks the BigQuery Data Viewer role, or the GSA doesn't have the Workload Identity binding for the KSA.
Fix: Run gcloud iam service-accounts get-iam-policy on the GSA and verify the id.goog member binding.
Scenario: GitHub Actions can't authenticate to Google Cloud.
Diagnosis: The issuer in the Workload Identity Provider doesn't match GitHub's OIDC URL, or the subject mapping is incorrect.
Fix: Verify the provider settings against the official GitHub OIDC documentation.
PSE Exam Scenarios
Scenario 1: Key Management Reduction
"Your company has 500 microservices across 3 clouds. What is the most secure way to manage their Google Cloud credentials?" Answer: Implement Workload Identity Federation for all 3 clouds to eliminate the need to store, rotate, and manage 500 JSON keys.
Scenario 2: GKE Multi-tenancy
"Two different teams are running pods in the same GKE cluster but different namespaces. How do you ensure Team A's pods cannot use Team B's permissions?" Answer: Create separate KSAs in each namespace and bind them to different GSAs. Use Kubernetes Network Policies to further isolate the namespaces.
FAQ
Q1: Does Workload Identity work with Autopilot clusters? A1: Yes, Workload Identity is enabled by default in GKE Autopilot.
Q2: Can I federate with an on-premises LDAP server? A2: Not directly. You must use an OIDC-compliant wrapper like Okta, Keycloak, or Ping Identity as the bridge.
Q3: What is the cost of Workload Identity Federation? A3: Google Cloud does not charge for the federation service itself, but standard API usage costs apply.
Summary Checklist
- Explain the flow of a token exchange in GKE Workload Identity.
- List the steps to configure Workload Identity Federation for AWS.
- Define a "Workload Identity Pool" and its purpose.
- Annotate a KSA for Workload Identity.
- Describe the security benefits of short-lived tokens over JSON keys.