examlab .net The most efficient path to the most valuable certifications.
In this note ≈ 20 min

Cloud Workflows for Data Operations

3,820 words · ≈ 20 min read ·

Practical PDE study notes on Cloud Workflows for data ops: YAML syntax, HTTP connectors, parallel steps, sub-workflows, retry, IAM, OIDC auth, and Composer comparison.

Do 20 practice questions → Free · No signup · PDE

Introduction to Cloud Workflows for Data Ops

Cloud Workflows is a serverless orchestrator built for stitching Google Cloud APIs and HTTP services together with a declarative YAML definition. For a data engineer, Cloud Workflows for data ops fills the awkward middle ground between a single scheduled BigQuery query and a full Apache Airflow deployment. You write steps, the platform runs them, and you only pay for executions. There are no clusters to size, no Python virtualenvs to patch, and no DAG parser to crash at 3 a.m.

This study note walks through the parts of Cloud Workflows that show up on the PDE exam: YAML and JSON syntax, the HTTP and Google Cloud connector model, parallel branches and iteration, sub-workflows, retry and error handling, scheduling, IAM with service accounts, and OIDC token authentication. It also positions Workflows against Cloud Composer and Cloud Run jobs so you know which tool to recommend in a scenario question.

白話文解釋(Plain English Explanation)

Workflows as a kitchen ticket rail

Picture a busy diner. The waiter clips a paper ticket to a metal rail above the pass. The cook reads line one ("two eggs over easy"), flips the ticket up, reads line two ("hash browns crispy"), and so on. If the eggs burn, the cook starts that line again instead of the whole order. If the rail is empty, nobody is on the clock waiting for tickets. Cloud Workflows is the rail. Each YAML step is a ticket line. The platform tracks what is done, what failed, and what to retry, and you only pay for the seconds the cook is actually working a ticket.

Workflows as a relay race baton

A relay race has runners standing in lanes with one baton. Runner one finishes their leg, hands the baton to runner two, who hands it to runner three. The baton carries the state. In Cloud Workflows, the result of one step is the baton handed to the next, available through the result keyword and bound to a variable. Sub-workflows are like sending the baton into a side track for a specialist runner before it comes back to the main relay. Parallel steps are like a four-by-one-hundred where four batons run simultaneously and meet at the finish line.

Workflows as a hotel concierge

You ask the concierge to book dinner, schedule a taxi, and confirm a spa appointment. The concierge calls the restaurant, calls the taxi service, calls the spa, retries when one line is busy, and reports back with one tidy summary. The concierge does not cook your meal or drive your car. Cloud Workflows is the concierge: it does not run your data transformation, it calls Dataflow, BigQuery, and Cloud Functions on your behalf, handles the busy signals, and gives you a single execution record at the end.

Core Concepts of Cloud Workflows

Cloud Workflows revolves around a few primitives. A workflow is the deployed definition. An execution is one run of that definition with its own input, state, and output. A step is a unit of work, usually an HTTP call, an assignment, a switch, or a control flow keyword like next, return, or raise.

Definitions live in YAML or JSON. The two are interchangeable, and the runtime parses both. Most teams pick YAML because comments are allowed and indentation makes the step graph readable. Variables are typed loosely, in the same JSON spirit as JavaScript: numbers, strings, booleans, lists, maps, and null. You assign with assign: blocks and reference with ${...} expressions. The expression language supports arithmetic, string concatenation, list and map access, and a small standard library under namespaces such as text, json, list, map, time, and sys.

Every workflow runs as a service account. That identity, not the caller's identity, decides what Google Cloud APIs the workflow can touch. This single fact answers a surprising number of "why did my workflow get a 403" support tickets.

A connector is a packaged way to call a Google Cloud API from Workflows without writing the full HTTP plumbing. Connectors handle authentication, polling for long-running operations, and pagination. The list lives at https://cloud.google.com/workflows/docs/reference/googleapis.

An OpenID Connect ID token signed by Google for a specific service account, with an audience claim that matches the receiving service URL. Cloud Run, Cloud Functions, and other private endpoints accept OIDC tokens as proof that the caller is allowed to invoke them. See https://cloud.google.com/workflows/docs/authentication.

Workflows YAML and JSON Syntax

A minimal workflow has a main block with a list of steps. Each step is a map whose only key is the step name. Inside that map you place exactly one of assign, call, switch, for, parallel, try, return, raise, or next.

main:
  params: [input]
  steps:
    - log_input:
        call: sys.log
        args:
          text: ${"Received " + json.encode_to_string(input)}
          severity: INFO
    - read_table:
        call: googleapis.bigquery.v2.jobs.query
        args:
          projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
          body:
            query: "SELECT COUNT(*) AS c FROM dataset.events WHERE DATE(ts) = CURRENT_DATE()"
            useLegacySql: false
        result: query_result
    - return_count:
        return: ${query_result.rows[0].f[0].v}

The ${} expression syntax is the same in YAML and JSON. The params line declares accepted inputs. Steps run top to bottom unless next: jumps explicitly. The final value of the workflow is whatever a return step emits, which is then visible in the execution detail page and in the API response from executions.run.

JSON form is straightforward but verbose because every map key needs quotes. Generated CI tooling sometimes prefers JSON because it is easier to validate with a JSON schema. Hand-written workflows almost always stay in YAML.

Use sys.get_env("GOOGLE_CLOUD_PROJECT_ID") instead of hardcoding the project ID. The same workflow definition then deploys to dev, staging, and prod without edits. Reference: https://cloud.google.com/workflows/docs/reference/stdlib/sys/get_env

HTTP Connectors and Google Cloud Connectors

Workflows speaks HTTP natively. A call: http.get, http.post, http.put, http.patch, or http.delete step targets any reachable URL. The args block carries url, headers, body, query, timeout, and an auth block that asks Workflows to attach a Google-issued OAuth2 access token or OIDC ID token automatically.

- call_internal_api:
    call: http.post
    args:
      url: https://invoice-service-abc123-uc.a.run.app/process
      auth:
        type: OIDC
        audience: https://invoice-service-abc123-uc.a.run.app
      headers:
        Content-Type: application/json
      body:
        invoice_id: ${invoice_id}
      timeout: 1800
    result: process_response

Google Cloud connectors wrap that pattern for first-party services. Calling googleapis.dataflow.v1b3.projects.locations.flexTemplates.launch from a connector is shorter than constructing the equivalent JSON body and waiting on the long-running operation by hand. Connectors also handle two annoying problems for you: long-running operation polling (the connector blocks until the operation reports done) and pagination (some connectors automatically follow nextPageToken).

The catch is that connector calls count against your workflow execution quota and against the underlying API quota. A connector polling a Dataflow job every fifteen seconds for an hour is two hundred forty steps. That is fine; just remember when you read the bill.

HTTP requests from Workflows have a default timeout of approximately 30 minutes per call, but the absolute maximum is 1800 seconds for a single HTTP step. If a downstream service can run longer, switch to a fire-and-forget pattern: trigger the job, then poll its status with a separate connector or http.get loop. Reference: https://cloud.google.com/workflows/quotas

Parallel Steps and Iteration

A parallel: block runs branches concurrently up to a configurable degree. There are two shapes. The first is a fixed list of named branches:

- fan_out_extracts:
    parallel:
      shared: [results]
      branches:
        - extract_orders:
            steps:
              - run_orders:
                  call: extract_table
                  args: {table: "orders"}
                  result: orders_out
              - save_orders:
                  assign:
                    - results.orders: ${orders_out}
        - extract_customers:
            steps:
              - run_customers:
                  call: extract_table
                  args: {table: "customers"}
                  result: customers_out
              - save_customers:
                  assign:
                    - results.customers: ${customers_out}

The second is a for: loop nested inside parallel:, which fans out one branch per item in a list. The concurrency_limit and shared keys control how many run at once and which variables can be written from inside the branches.

- parallel_load_partitions:
    parallel:
      shared: [load_results]
      concurrency_limit: 5
      for:
        value: partition_date
        in: ${partition_dates}
        steps:
          - load_one:
              call: googleapis.bigquery.v2.jobs.insert
              args:
                projectId: ${project_id}
                body:
                  configuration:
                    load:
                      sourceUris: ["gs://bucket/data/${partition_date}/*.parquet"]
                      destinationTable:
                        projectId: ${project_id}
                        datasetId: "warehouse"
                        tableId: "events$${text.replace_all(partition_date, '-', '')}"
                      sourceFormat: "PARQUET"
                      writeDisposition: "WRITE_TRUNCATE"
              result: load_result
          - record:
              assign:
                - load_results[len(load_results)]: ${load_result}

Variables read inside parallel branches see the snapshot from before the parallel block started. Writes only land in variables listed under shared, and even then must use atomic-ish patterns (assignment to a fresh map key, list append by index). This isolation is part of why parallel steps are safe; it is also why beginners get confused when they write to a regular variable inside a branch and find it empty afterward.

Forgetting to list a variable under shared inside a parallel block. The branches will appear to run, but any writes vanish when the block ends, and nothing in the logs explains why your aggregated counter is zero. Always declare every variable mutated inside a parallel branch in the shared list. Reference: https://cloud.google.com/workflows/docs/reference/syntax/parallel-steps

Sub-Workflows for Reuse

A sub-workflow is a second top-level block in the same definition file with its own params and steps. The main workflow calls it like any other function with call: subworkflow_name. Sub-workflows are the right place to put reusable patterns: "wait for a Dataflow job", "decode a Pub/Sub push message", "send a Slack alert with a templated body".

main:
  steps:
    - run_pipeline:
        call: launch_and_wait
        args:
          template_path: "gs://templates/orders-etl"
          parameters:
            input_date: "2026-05-12"
        result: pipeline_result

launch_and_wait:
  params: [template_path, parameters]
  steps:
    - launch:
        call: googleapis.dataflow.v1b3.projects.templates.launch
        args:
          projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
          gcsPath: ${template_path}
          body:
            jobName: ${"orders-etl-" + parameters.input_date}
            parameters: ${parameters}
        result: launch_response
    - poll:
        call: poll_dataflow_job
        args:
          job_id: ${launch_response.job.id}
        result: final_state
    - return_state:
        return: ${final_state}

Sub-workflows do not have private state visible to the parent; they communicate via parameters in and return out. Recursion is technically possible but is constrained by the per-execution step quota, so do not implement quicksort in YAML.

Error Handling and Retry

Workflows supports a try/retry/except pattern that should feel familiar to any Python or Java developer. A try: block lists the steps to attempt. A retry: block sets the policy: a predicate function that decides whether the error is retryable, plus a max_retries, initial_delay, max_delay, and multiplier for exponential backoff. The standard library ships two ready-made predicates: http.default_retry_predicate (retry on HTTP 429, 502, 503, 504, plus connection errors) and http.default_retry_predicate_non_idempotent (a more conservative version safe for non-idempotent verbs).

- call_with_retry:
    try:
      call: http.post
      args:
        url: https://api.partner.example.com/ingest
        auth:
          type: OIDC
          audience: https://api.partner.example.com
        body: ${payload}
      result: ingest_response
    retry:
      predicate: ${http.default_retry_predicate}
      max_retries: 5
      backoff:
        initial_delay: 2
        max_delay: 60
        multiplier: 2
    except:
      as: e
      steps:
        - log_failure:
            call: sys.log
            args:
              text: ${"Ingest failed: " + json.encode_to_string(e)}
              severity: ERROR
        - alert:
            call: send_slack_alert
            args:
              message: ${"Partner ingest failed after retries: " + e.message}
        - rethrow:
            raise: ${e}

The as: e binding gives the exception block access to the structured error: an HTTP status, a body, headers, and the call that failed. You can branch on e.code to route 4xx errors to a "data quality" path and 5xx errors to a "platform incident" path. A raise: step inside except: re-throws so the parent workflow or the operator can see the failure.

Cloud Workflows does not automatically retry HTTP calls. If you do not wrap a call in a try/retry block, a transient 503 will fail the entire execution. Make the retry block your default for any external HTTP call, even ones you think are reliable. Reference: https://cloud.google.com/workflows/docs/reference/syntax/retry-steps

Scheduling Workflows with Cloud Scheduler

Cloud Workflows has no built-in cron. Scheduling is handed off to Cloud Scheduler, which fires an HTTP target on a cron expression. The standard pattern is a Scheduler job that POSTs to the Workflows Executions API, authenticated with an OAuth2 token tied to a service account that has roles/workflows.invoker on the target workflow.

gcloud scheduler jobs create http daily-orders-etl \
  --location=us-central1 \
  --schedule="0 2 * * *" \
  --time-zone="UTC" \
  --uri="https://workflowexecutions.googleapis.com/v1/projects/$PROJECT/locations/us-central1/workflows/orders-etl/executions" \
  --http-method=POST \
  --oauth-service-account-email="scheduler-invoker@$PROJECT.iam.gserviceaccount.com" \
  --headers="Content-Type=application/json" \
  --message-body='{"argument":"{\"date\":\"yesterday\"}"}'

The argument field is a JSON string, double-escaped because it sits inside another JSON document. That argument arrives in the workflow as the params value of the main block, where you parse it with json.decode(input) if you want a structured object.

For event-driven pipelines instead of time-driven ones, swap Cloud Scheduler for Eventarc. Eventarc triggers a workflow when a Pub/Sub message lands, when a Cloud Storage object is created, or when a CloudAudit log entry matches a filter. The wiring is identical from the workflow's perspective; only the source changes.

Workflows vs Composer vs Cloud Run Jobs

Three orchestration options show up on the PDE exam, and they are not interchangeable.

Cloud Composer is managed Apache Airflow. It runs on a GKE cluster you do not see but do pay for, around 300 to 600 USD per month for the smallest production-suitable environment. It shines when you have many teams sharing one orchestrator, complex DAG dependencies measured in hundreds of tasks, a Python-heavy custom operator ecosystem, or hard requirements for a specific Airflow plugin. It is overkill when you have one pipeline that runs once a day.

Cloud Workflows is serverless and step-priced (currently on the order of fractions of a cent per thousand internal steps and a few cents per thousand external steps). There are no clusters and no idle cost. It is excellent when your orchestration is "call API A, call API B, branch on the result, fan out, aggregate". It struggles when you need a dependency graph that branches and rejoins in complex ways, when you need a rich UI for non-engineers to monitor runs, or when you need backfills measured in years of historical dates.

Cloud Run jobs are the right answer when the work itself, not the orchestration, is the hard part. A Cloud Run job is a container that runs to completion. If you have a Python script that pulls a CSV, transforms it, and writes Parquet, a Cloud Run job runs it on a schedule with no orchestrator at all. Pair a Cloud Run job with Cloud Scheduler for a cron, or invoke it from a Workflows step when you need it inside a larger pipeline.

A common production pattern uses all three: Composer manages high-level pipeline DAGs across teams; individual Composer tasks invoke Workflows for short multi-API choreographies; Workflows steps call Cloud Run jobs for the actual transformation logic. Each tier handles what it does best.

If your "DAG" is really a sequence with one branch and one fan-out, do not reach for Composer. A 60-line workflow plus a Cloud Scheduler job replaces the entire Airflow environment, saves the monthly cost, and removes a server you have to patch. Reference: https://cloud.google.com/workflows/docs/comparing-workflows

IAM for Service-to-Service Calls

Every workflow execution runs as a service account, configured at deploy time with --service-account. That account is the principal that the workflow uses for all Google Cloud connector calls and for any HTTP call that asks for an OAuth2 or OIDC token through the auth: block.

A clean setup uses three identities:

  1. A workflow runtime service account that holds only the roles it actually needs: roles/bigquery.jobUser on the project, roles/dataflow.developer if it launches jobs, roles/run.invoker on each Cloud Run service it calls. Never grant roles/owner or roles/editor here; these are the two roles auditors flag first.
  2. A deployer service account for CI/CD that holds roles/workflows.editor to push new versions.
  3. A trigger service account held by Cloud Scheduler or Eventarc, with roles/workflows.invoker on the workflow.

Splitting the trigger from the runtime keeps the blast radius small. If the Scheduler account leaks, an attacker can run the workflow but cannot edit it or impersonate the runtime account directly.

Reusing the default Compute Engine service account for the workflow runtime. The default account starts with roles/editor, which gives the workflow blanket access to nearly every API in the project. A bug or compromise in the workflow then has Editor on the project. Always create a dedicated service account per workflow, or at least per pipeline domain. Reference: https://cloud.google.com/workflows/docs/authentication

OIDC Authentication for Private Endpoints

Cloud Run services and Cloud Functions deployed without --allow-unauthenticated require an OIDC ID token in the Authorization: Bearer ... header. Workflows generates that token for you when you set auth.type: OIDC and pass an audience. The audience must be the URL of the receiving service, exactly as deployed. A trailing slash, a wrong region, or the service URL versus the custom domain are common mismatches.

- call_private_function:
    call: http.post
    args:
      url: https://process-events-abc123-uc.a.run.app/run
      auth:
        type: OIDC
        audience: https://process-events-abc123-uc.a.run.app
      body:
        date: ${target_date}
    result: function_result

For calls to Google APIs that require an OAuth2 access token rather than an ID token, use auth.type: OAuth2. Workflows automatically scopes the token to the called API. You only need to specify scopes manually for niche APIs that diverge from the default.

The receiving service must trust the workflow's service account. For Cloud Run, this means granting roles/run.invoker to the workflow runtime service account on the target service. For a third-party HTTPS endpoint that validates Google OIDC tokens (some SaaS partners do this), share the workflow runtime service account email so they can put it in their allow list.

Architecture and Design Patterns

A few patterns recur in production data ops workflows.

Fan-out, fan-in. Read a manifest of partitions or table names, run a parallel branch per item, collect the results, then run an aggregation step. Use concurrency_limit to protect downstream APIs from being hammered. This is the right shape for daily multi-table ingestion or backfill of a date range.

Trigger and poll. A long-running job (Dataflow, BigQuery export, Vertex AI training) is started by one step, then a polling sub-workflow loops with a sys.sleep and an HTTP GET until status is DONE. The polling sub-workflow is reusable across pipelines and centralizes timeout logic. Most Google Cloud connectors hide this pattern by polling for you.

Saga-style compensation. When step three fails, the except block calls a compensating action for steps one and two: drop the staging table, delete the partial GCS files, mark the run as rolled back. Workflows is well suited to this because state is structured and exception handling is first class.

Approval gating. A workflow step writes a row to a small pending_approvals table or sends a Slack message with an approval link, then enters a polling loop or waits for a callback URL hit. Combined with Cloud Tasks for the long wait, this lets non-engineering reviewers gate a production load.

GCP Service Deep Dive

Cloud Workflows integrates with the rest of the GCP data stack through connectors and HTTP. The most common targets:

  • BigQuery: load jobs, query jobs, export jobs, table copy. The googleapis.bigquery.v2 connectors handle long-running operations cleanly. Use them for orchestration; do not use them as a replacement for scheduled queries when scheduled queries already do the job.
  • Dataflow: Flex Template launches and classic template launches. The connector returns the job ID; pair with a poll loop for completion. Workflows is the standard way to chain "extract with Dataflow, then transform in BigQuery".
  • Cloud Storage: object metadata reads, signed URL generation via the googleapis.storage.v1 connector. Avoid streaming large object bodies through Workflows; the per-step memory limit is small.
  • Pub/Sub: publish messages from a workflow step to push events to other systems. Subscribe by triggering a workflow from Eventarc with a Pub/Sub source rather than polling a subscription.
  • Cloud Functions and Cloud Run: the most common HTTP targets. Use OIDC and a private endpoint, never --allow-unauthenticated.
  • Vertex AI: trigger pipeline runs, custom training jobs, batch prediction jobs. Workflows is the lightweight glue that ties model training to upstream data preparation steps.

Common Pitfalls and Trade-offs

Step quotas bite at scale. Each execution has a maximum step count, and large parallel fan-outs combined with polling loops can push past it. If you are loading 365 daily partitions in parallel and each branch polls for 20 minutes at 15-second intervals, you are looking at thousands of steps. Either increase polling intervals, batch partitions, or split the work across multiple executions.

The 1.5 MB execution argument cap. You cannot pass a giant payload as the workflow argument. Pass references (a GCS URI, a BigQuery table name, a Pub/Sub message ID) and let the steps fetch the data they need.

Variable size limits. A single variable cannot hold an arbitrarily large list. If a step result returns 10 MB of JSON, the workflow execution may fail when it tries to assign it. For paginated APIs that might return huge responses, page through in a loop and process or write each page rather than accumulating everything in memory.

Sleep is not free. A sys.sleep step still counts against execution duration, which is bounded. For waits longer than an hour, prefer triggering a second workflow from Cloud Scheduler at the future time.

Logging is structured but verbose. Every step emits log entries. In Cloud Logging, filter by resource.type="workflows.googleapis.com/Workflow" and execution ID to find one run. Build a saved query early; you will use it often.

Best Practices

  • Keep one workflow file per pipeline. Use sub-workflows for shared helpers like "poll Dataflow", "send alert", "write run metadata". Resist the temptation to build a single mega-workflow that orchestrates the whole company.
  • Parameterize project, region, and dataset names through params and environment variables. The same definition then deploys to dev, staging, and prod with no edits.
  • Wrap every external HTTP call in try/retry with http.default_retry_predicate. The cost of an extra retry block is zero; the cost of a failed pipeline at 4 a.m. is real.
  • Use a dedicated service account per workflow, with the minimum roles to do the job. Audit roles quarterly.
  • Tag executions with a meaningful execution argument that includes the date or job ID. The execution list page is much easier to navigate when each row has a useful label.
  • Treat workflow YAML as source code. Store it in Git, review changes, deploy through CI with gcloud workflows deploy. Avoid the "edit in console" workflow except for emergency hotfixes.
  • Test the YAML locally with workflows-emulator or by deploying to a sandbox project before promoting. Failed deployments still count as a version.
  • Set Cloud Monitoring alerts on workflows.googleapis.com/finished_execution_count filtered by status="FAILED". A failed daily workflow should page someone within minutes, not be discovered the next morning.

Real-World Use Case

A mid-sized e-commerce company runs a daily revenue dashboard. The pipeline used to live in a 200-line shell script on a single VM that ran via cron. When the VM rebooted for OS patches, the pipeline silently missed a day. Migration to Cloud Workflows replaced the script with a 90-line YAML definition that:

  1. Pulls yesterday's order CSV from a partner's SFTP via a Cloud Run job (the Workflow step calls the job).
  2. Parallel-loads three regional partitions into BigQuery using the BigQuery connector with concurrency_limit: 3.
  3. Triggers a Dataflow Flex Template that joins orders with the product catalog and writes a denormalized table.
  4. Runs three BigQuery queries in parallel that refresh dashboard tables.
  5. Calls a Cloud Function over OIDC that posts a Slack message with the daily revenue number.
  6. On any failure, the except block writes a row to a pipeline_failures audit table and pages on-call via PagerDuty.

Cloud Scheduler triggers the workflow at 04:00 UTC. Total monthly cost dropped from about 50 USD for the always-on VM to under 5 USD for Workflows execution time, plus the Dataflow and Cloud Run costs that existed before. Mean time to detect a failure went from "next morning" to "within five minutes". The team also gained a real execution history they can scroll through, instead of grepping syslog on a VM.

Exam Tips

  • When the scenario describes "orchestrate a few API calls with retries and branching, no Python required, no cluster to manage", pick Cloud Workflows.
  • When the scenario emphasizes "complex DAG", "many teams sharing", "Airflow operators", or "backfill", pick Cloud Composer.
  • When the scenario is "run this container on a schedule", pick Cloud Run jobs plus Cloud Scheduler. Workflows is unnecessary overhead.
  • Workflows triggered on a cron always means Cloud Scheduler in front. Workflows itself has no cron.
  • Workflows triggered by a Cloud Storage upload or a Pub/Sub message means Eventarc.
  • Authentication to a private Cloud Run service from Workflows uses OIDC with audience equal to the service URL, plus roles/run.invoker on the workflow's service account.
  • Authentication to a Google API uses OAuth2; the connector usually handles it without an explicit auth: block.
  • A failed HTTP step that returned 503 with no retries configured is a misconfiguration, not a platform bug. Workflows does not auto-retry.
  • Variables written inside a parallel branch must be listed under shared: or the writes are lost.
  • The Compute Engine default service account has roles/editor. Using it for a workflow is an exam-flagged anti-pattern.

The four facts most likely to appear on a Workflows exam question: (1) Workflows has no built-in scheduler — Cloud Scheduler triggers it. (2) Cloud Workflows does not auto-retry HTTP calls — wrap them in try/retry. (3) OIDC tokens require an audience matching the target URL exactly. (4) Variables mutated inside parallel branches must be declared in shared:.

Frequently Asked Questions (FAQ)

When should I pick Cloud Workflows over Cloud Composer for data ops?

Pick Cloud Workflows when the orchestration is mostly a chain of API calls and HTTP requests, when the team does not already operate Airflow, when cost matters at low pipeline counts, or when you want zero idle infrastructure. Pick Composer when the workload is a complex DAG with hundreds of tasks, when you have an existing Airflow operator ecosystem, when many teams need a shared orchestrator with role-based access, or when historical backfill UI is essential. The two are not exclusive: Composer DAGs can call Cloud Workflows for short choreographies that do not deserve a full Airflow task tree.

How do I pass dynamic parameters from Cloud Scheduler to Cloud Workflows?

Set the Scheduler job's HTTP body to a JSON document with an argument field whose value is itself a JSON-encoded string. The string is passed verbatim to the workflow as the main block's input. Inside the workflow, declare params: [input] and use json.decode(input) to convert the string to a structured object. Avoid putting time-sensitive values like "today's date" in the Scheduler payload; instead, use time.now() inside the workflow so retries and manual reruns compute the right date.

Can Cloud Workflows replace Cloud Run jobs entirely?

No. Cloud Workflows is an orchestrator, not an execution environment. It cannot run arbitrary Python, install pip packages, or hold a Pandas dataframe in memory. If your workload is "run this container with this command", Cloud Run jobs are the right tool, optionally invoked from a workflow when the container fits inside a larger pipeline. Trying to express significant business logic in Workflows expressions hurts readability and is harder to unit test than equivalent code in a container.

What is the maximum execution duration for a single Cloud Workflow?

A single Cloud Workflow execution can run for up to one year, but each individual HTTP step is capped at approximately 1800 seconds. For long-running jobs, do not block the HTTP step on completion; instead, trigger the job, then poll its status with separate steps. Most Google Cloud connectors that wrap long-running operations already implement this polling pattern internally.

How does OIDC authentication work between Cloud Workflows and Cloud Run?

When a workflow step has auth.type: OIDC and an audience set to the Cloud Run service URL, Workflows asks the IAM Service Account Credentials API to mint an OIDC ID token signed by Google for the workflow's runtime service account, with the audience claim set to the URL. Workflows attaches the token in the Authorization: Bearer ... header on the outgoing request. The Cloud Run service validates the token signature, audience, and expiration, then checks that the token's principal (the workflow's service account email) holds roles/run.invoker on the service. If any of those checks fail, Cloud Run returns 401 or 403.

How do I run multiple BigQuery loads in parallel in Cloud Workflows?

Use a parallel: block with a for: loop over the list of partitions or files, set a concurrency_limit to keep BigQuery slot pressure manageable, and call googleapis.bigquery.v2.jobs.insert from each branch. List the result accumulator variable under shared: so each branch can record its outcome. Connector calls block until the load job finishes, so you do not need a separate polling loop.

What happens to in-progress executions when I deploy a new workflow version?

In-progress executions continue running on the version they started with. New executions started after deployment use the new definition. There is no automatic mid-flight upgrade. This makes deploys safe even during active pipeline runs, but it also means that buggy old versions can keep producing bad output until their executions finish or are cancelled manually.

Further Reading

Official sources

More PDE topics