What Is Vertex AI?
Unified platform origins
For the Google Cloud Digital Leader (CDL) exam, Vertex AI is the single most important AI/ML platform name you need to remember. Vertex AI is Google Cloud's unified machine-learning platform — one console, one API, one set of tools that covers the entire ML lifecycle from data preparation through experimentation, training, deployment, monitoring, and Generative AI. Before Vertex AI existed, Google Cloud customers had to mix and match two separate products: AI Platform (for custom models written in TensorFlow or scikit-learn) and AutoML (for no-code training on tabular, image, text, and video data). Each had its own console, its own SDK, and its own pricing model. In 2021 Google merged them into Vertex AI, and every subsequent ML capability — Pipelines, Feature Store, Model Garden, Generative AI Studio — has been launched inside this unified umbrella.
Why MLOps is the headline benefit
As a Cloud Digital Leader you will rarely write training code, but you will be asked to explain to business stakeholders why Vertex AI accelerates ML projects versus running raw TensorFlow on Compute Engine or hand-rolling Kubernetes ML clusters. The short answer is MLOps: Vertex AI is opinionated about how models move from notebook to production, and the platform handles the boring-but-critical parts — feature versioning, experiment tracking, model registry, endpoint scaling, drift monitoring, and explainability — so data-science teams can focus on the model itself rather than the surrounding plumbing. On the CDL exam you will see questions that pit Vertex AI against pre-trained APIs (when the problem is generic) and against BigQuery ML (when the data already lives in BigQuery and the team is SQL-fluent). Knowing when each is the right answer is a recurring exam pattern. For broader AI context, review the AI and Machine Learning fundamentals topic.
白話文解釋(Plain English Explanation)
Vertex AI is a name that sounds technical but actually describes a very familiar everyday concept: a place where all the parts of a complex process have been organized under one roof. Most data-science teams without Vertex AI feel like a chef who has to drive between three different markets to buy ingredients, rent a separate kitchen, ship the food to a restaurant, and then run their own delivery service. Vertex AI is the version where everything happens in one building. The following analogies help business audiences understand what the platform actually does.
Analogy 1 — A One-Stop Kitchen From Ingredients to Plated Dish
Imagine a restaurant where the chef can walk into a single building and find everything needed to produce a meal: a refrigerated pantry of pre-washed ingredients, a row of professional cooktops, a tasting station, a plating station, a delivery counter, and a customer-feedback box. Nothing has to be transported between buildings; ingredients flow from the pantry to the cooktop to the plate without ever leaving the property. This is what Vertex AI feels like to a data-science team. Vertex AI Workbench is the kitchen-prep area — a managed Jupyter notebook environment where data scientists explore data and write experiments. Vertex AI Training is the cooktop — a fully managed service that runs custom training jobs at any scale, on CPUs, GPUs, or TPUs, without the team provisioning servers. Vertex AI Prediction is the plating-and-delivery station — it deploys trained models behind autoscaling HTTPS endpoints (online prediction) or runs large batch inference jobs against files in Cloud Storage or BigQuery.
The kitchen also has a sous-chef who remembers every recipe attempt: Vertex AI Experiments tracks every training run, the parameters used, and the resulting accuracy. The pantry keeps ingredients consistent: Vertex AI Feature Store centralizes features so the same feature definition is used during training and during real-time serving (avoiding the dreaded training-serving skew). Even the delivery service is monitored: Vertex AI Model Monitoring watches predictions in production and alerts when input data drifts away from what the model was trained on. The Cloud Digital Leader takeaway is that Vertex AI converts a fragmented ML pipeline into a single integrated kitchen, which is why projects move from prototype to production in weeks instead of quarters.
Analogy 2 — An Airport That Handles Check-in to Boarding in One Building
A second useful analogy is a modern international airport. When you fly, you do not visit different cities for check-in, security, boarding, baggage, and customs — they all happen inside one building, in the right order, with clear signage and shared infrastructure. Vertex AI works the same way for ML workflows. Vertex AI Pipelines is the equivalent of the airport's flow-control system, orchestrating the sequence of steps that turn raw data into a deployed model. A pipeline run might look like: ingest data from BigQuery, run a data-validation step, train a model with Vertex AI Training, evaluate the model, register the winning model in Vertex AI Model Registry, and finally deploy it to a Prediction endpoint — all defined as code and re-runnable on a schedule.
Just like an airport has separate terminals for different airlines, Vertex AI Pipelines supports both Kubeflow Pipelines SDK and TensorFlow Extended (TFX) under the hood, but the user experience is unified. The benefit for the business is repeatability: every model retrain produces the same artifacts in the same order, fully audited, with metadata captured automatically in Vertex ML Metadata. The airport analogy also extends to operations: just as airports publish on-time performance metrics, Vertex AI Pipelines produces lineage graphs showing exactly which data version, which feature version, and which code version produced each model in production — invaluable for regulated industries like healthcare and finance.
Analogy 3 — A Department Store Where Every Counter Is a Specialist but Checkout Is Unified
A third helpful image is a large department store. Each counter is staffed by a specialist — cosmetics, electronics, menswear, kitchenware — but the customer pays at a single unified checkout, accumulates one set of loyalty points, and gets one delivery slip. Vertex AI is a department store of ML capabilities. AutoML is the counter for business users who want models without writing code; it handles tabular, image, text, and video data. Custom Training is the counter for data scientists who bring their own TensorFlow, PyTorch, or scikit-learn code. Model Garden is the counter that sells pre-built foundation models — Google's own Gemini family, Anthropic's Claude on Vertex AI, Meta's Llama, and hundreds of open-source models — ready to deploy with one click. Generative AI Studio is the counter for prompt engineering, fine-tuning, and Retrieval-Augmented Generation (RAG) on foundation models.
Each counter has its own specialty, but the unified checkout is the shared IAM, billing, monitoring, and audit logging that all Vertex AI components share with the rest of Google Cloud. This means a single VPC Service Controls perimeter protects every Vertex AI workload, a single Cloud Logging stream captures every prediction, and a single billing report shows ML spend by project. For a Cloud Digital Leader, this unification is the business case: instead of buying a separate startup tool for each ML step, the customer gets one platform with consistent governance — a major reason regulated enterprises like banks and hospitals choose Vertex AI over piecing together open-source components themselves. See the BigQuery topic for how Vertex AI integrates tightly with BigQuery as the data source.
Vertex AI Workbench — Managed Notebooks for Data Scientists
Instances and user-managed notebooks
Vertex AI Workbench is the entry point for most data scientists. It provides managed JupyterLab environments hosted on Google Cloud, with two flavors: Instances (single-user managed notebooks with full IDE features and seamless integration to BigQuery, Cloud Storage, and Dataproc) and User-managed notebooks (more customization, customer-managed VMs). A data scientist clicks "New Instance", chooses a machine type and optional GPU, and within two minutes has a fully configured JupyterLab with TensorFlow, PyTorch, scikit-learn, BigQuery client, and gcloud pre-installed.
Onboarding speed and BigQuery integration
The key business benefit is that Workbench removes the days of setup time that used to plague data-science onboarding. Instead of installing CUDA drivers, configuring Python environments, and wiring up service accounts, a new hire can start exploring company data within minutes. Workbench instances also integrate directly with BigQuery — a data scientist can browse BigQuery tables in the JupyterLab sidebar, drag a table into a cell, and immediately start a pandas analysis with a single magic command (%%bigquery df). This is the difference between a laptop-bound analyst and a cloud-native one.
Vertex AI Training — Custom Model Training at Any Scale
Job submission and managed infrastructure
Vertex AI Training is the managed service for running custom training jobs. The data scientist packages their training code (TensorFlow, PyTorch, scikit-learn, XGBoost, or a custom container) and submits a training job specifying the machine type, accelerator (CPU / GPU / TPU), region, and dataset. Vertex AI provisions the infrastructure, runs the training, captures logs to Cloud Logging, writes the resulting model artifact to Cloud Storage, and tears down the cluster — all without the team managing VMs or Kubernetes.
Distributed training and hyperparameter tuning
For large models, Vertex AI Training supports distributed training across multiple machines and TPU pods, and hyperparameter tuning via Vertex AI Vizier — the same Bayesian-optimization service Google uses internally. The CDL-level message is that custom training is the right choice when the business has unique data and in-house ML expertise; teams without expertise should fall back to AutoML, which we cover next.
Vertex AI AutoML — No-Code Model Training
Supported data types
AutoML is Vertex AI's no-code training service. It allows non-developers to train high-quality custom models simply by uploading a dataset and clicking a few buttons. AutoML supports four primary data types: tabular (regression, classification, forecasting on CSV or BigQuery data), image (classification, object detection), text (classification, entity extraction, sentiment), and video (classification, action recognition, object tracking).
Neural architecture search under the hood
Under the hood, AutoML uses neural architecture search to automatically discover the best model architecture and hyperparameters for the given dataset. The result is often comparable to what an experienced data scientist would produce manually, but achieved in hours rather than weeks. On the CDL exam, AutoML is the right answer when a question emphasizes "no in-house ML expertise", "business users", "minimal code", or "fastest time to value with custom data".
Vertex AI vs Pre-trained APIs vs BigQuery ML — the three-way decision is the most common CDL exam pattern. Use pre-trained APIs (Vision, Speech-to-Text, Translation, Natural Language) when the problem is generic and no custom training is needed. Use BigQuery ML when the data already lives in BigQuery and the team prefers SQL (CREATE MODEL ...). Use Vertex AI (Custom Training or AutoML) when the problem is unique to the business and requires bespoke models. See https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform.
Vertex AI Prediction — Online and Batch Inference
Online versus batch serving
Once a model is trained, it must be served. Vertex AI Prediction provides two serving modes. Online prediction deploys the model behind a low-latency HTTPS endpoint that autoscales based on traffic — ideal for real-time use cases like fraud scoring at the point of sale or product recommendations on a website. Batch prediction runs the model over a large dataset in Cloud Storage or BigQuery — ideal for nightly scoring of every customer in the database.
Model Registry, traffic splitting, and monitoring
Both modes share the same Vertex AI Model Registry, which versions every deployed model and links it back to the training run, the dataset version, and the evaluation metrics. Vertex AI Endpoints support traffic splitting for canary releases (e.g., send 10% of requests to model version 2 while 90% still go to version 1) and model monitoring for detecting input drift and prediction skew. For business audiences, this is the difference between "we trained a model" and "we have a model running in production with proper guardrails".
Vertex AI Pipelines — ML Workflow Orchestration
Vertex AI Pipelines runs serverless ML workflows defined as Kubeflow Pipelines or TFX. A pipeline is a directed acyclic graph (DAG) of steps — ingest data, validate schema, transform features, train model, evaluate, register, deploy. Each step runs in its own container, with inputs and outputs tracked automatically. Pipelines can be triggered manually, on a schedule via Cloud Scheduler, or on events via Eventarc (for example, when new training data lands in Cloud Storage).
Vertex AI Pipelines is not the same as Cloud Workflows or Cloud Composer. Vertex AI Pipelines is purpose-built for ML workflows with native model and dataset tracking. Cloud Workflows is a general-purpose serverless orchestrator for API calls and microservices. Cloud Composer is managed Apache Airflow for general data engineering. On the CDL exam, if the scenario explicitly mentions ML model training and deployment, the answer is Vertex AI Pipelines — not Cloud Workflows. See https://cloud.google.com/vertex-ai/docs/pipelines/introduction.
Vertex AI Feature Store — Centralized Feature Management
What is a feature and why centralize it
Vertex AI Feature Store is a centralized repository for ML features. A feature is a measurable property derived from raw data — for example, "average order value over the last 30 days" or "number of failed login attempts in the last hour". Without a feature store, every data-science team computes features independently, leading to inconsistent definitions and the dreaded training-serving skew where a model trained on one feature definition is served with a slightly different one in production.
Storage variants — classic and BigQuery-based
Vertex AI Feature Store solves this by storing canonical feature definitions and serving them at low latency. There are two storage variants: the original Feature Store (online + offline storage) and the newer Feature Store (BigQuery-based) which uses BigQuery as the offline source of truth and Bigtable for online serving. Both ensure the same feature definition is used for training and real-time prediction, eliminating an entire category of production bugs.
Vertex AI Model Garden — The Catalog of Foundation Models
Vertex AI Model Garden is the catalog of pre-built foundation models available on Google Cloud. It includes Google's own Gemini family (Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 1.0 Pro), Google open-weight models like Gemma, third-party models like Anthropic's Claude (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku), Meta's Llama family, Mistral, and hundreds of open-source models from Hugging Face. Each model in the Garden can be deployed to a Vertex AI endpoint with one click, fine-tuned with custom data, or used as the foundation of a Retrieval-Augmented Generation (RAG) application.
Vertex AI Studio is the entry point for Generative AI; Model Garden is the catalog of foundation models. Studio is where you write prompts, design templates, and fine-tune models. Model Garden is where you browse and select which foundation model to use — Gemini, Claude, Llama, Gemma, Mistral, and many open-source options. The two work together: pick a model in Model Garden, then prompt or fine-tune it in Vertex AI Studio. See https://cloud.google.com/model-garden.
Vertex AI Studio — The Generative AI Workspace
Four primary capabilities
Vertex AI Studio (sometimes called Generative AI Studio in older docs) is the unified workspace for building Generative AI applications. It provides four primary capabilities: prompt design (interactive prompt editor with side-by-side model comparison), fine-tuning (supervised fine-tuning, RLHF, and adapter tuning of foundation models), embedding generation (text and multimodal embeddings for semantic search), and grounding and RAG (connect foundation models to your own data via Vertex AI Search or custom vector databases).
Business workflow for building Generative AI apps
For Cloud Digital Leaders, Vertex AI Studio is the answer to "how does our company actually build a chatbot or a content-generation tool on Google Cloud?". The flow is: open Vertex AI Studio, pick a model from Model Garden (e.g., Gemini 1.5 Pro), iterate on prompts in the prompt designer, optionally fine-tune with company-specific examples, then export the prompt as an API call that your application can use. The entire flow is governed by the same IAM and audit-logging stack as the rest of Vertex AI, so enterprise governance is preserved.
MLOps is the core business benefit of Vertex AI over DIY infrastructure. Running TensorFlow on Compute Engine VMs or building your own Kubeflow cluster on GKE will work — but the team must then build experiment tracking, feature versioning, model registry, endpoint autoscaling, drift monitoring, and explainability themselves. Vertex AI provides all of these out of the box, which is why CDL exam scenarios that mention "managed", "fully integrated", "minimal operational overhead", or "consistent governance" point to Vertex AI rather than self-managed alternatives. See https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform.
Vertex AI vs BigQuery ML — A Practical Comparison
BigQuery ML lets analysts train ML models directly inside BigQuery using SQL — CREATE MODEL ... OPTIONS(model_type='logistic_reg'). It is the right choice when the data is already in BigQuery, the team is SQL-fluent, and the model type is one of the supported algorithms (linear regression, logistic regression, k-means, matrix factorization, ARIMA forecasting, boosted trees, deep neural networks, and AutoML Tables). For more advanced needs — custom architectures, distributed training, GPU/TPU acceleration, MLOps lifecycle — Vertex AI is the better fit.
You do not have to choose between BigQuery ML and Vertex AI. BigQuery ML can register models into the Vertex AI Model Registry, and Vertex AI Pipelines can call BigQuery ML steps as part of a larger workflow. A mature data team often uses BigQuery ML for rapid prototyping by analysts and then promotes successful experiments into Vertex AI for production deployment with full MLOps controls. See https://cloud.google.com/bigquery/docs/bqml-vertex-ai.
Vertex AI vs Pre-trained APIs — When Is Each the Right Answer?
Pre-trained APIs (Cloud Vision, Cloud Natural Language, Speech-to-Text, Cloud Translation, Document AI) are Google-trained models exposed via simple REST endpoints. They are the right answer for generic problems where the data does not need to be customer-specific. Vertex AI (Custom Training or AutoML) is the right answer for problems unique to the business — proprietary product defects, industry-specific document classification, customer-specific behavior prediction. Many production systems combine both: a pre-trained Vision API extracts text from an invoice, and a custom Vertex AI model classifies the invoice into the company's specific vendor categories.
MLOps is the set of practices that apply DevOps principles to machine learning: version control for data and models, automated training pipelines, continuous integration of model code, continuous delivery of model artifacts, monitoring of model performance in production, and rollback procedures when models degrade. Vertex AI provides built-in MLOps tooling — Experiments, Model Registry, Pipelines, Model Monitoring — so teams do not have to assemble these capabilities from open-source components. See https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.
Pricing Model and Cost Considerations
Component-based pricing breakdown
Vertex AI pricing is component-based. Workbench charges for the underlying VM by the second. Training charges per machine-hour with separate rates for CPU, GPU, and TPU. Prediction charges per node-hour for online endpoints (with autoscaling) and per prediction for batch jobs. AutoML has its own training-hour rates that include the architecture-search compute. Generative AI is priced per 1,000 input and output characters or tokens, varying by model (Gemini Pro is cheaper than Gemini Ultra; Gemini Flash is cheaper still). Pipelines charges for the underlying compute used by each step. Feature Store charges for stored feature data and online serving nodes.
Cost-optimization levers for CDL scenarios
For CDL-level recommendations, the key cost levers are: turn off Workbench instances when not in use, use batch prediction instead of online endpoints when sub-second latency is not required, schedule training in lower-cost regions, and start with Gemini Flash before stepping up to Gemini Pro for Generative AI workloads.
Security and Governance
IAM, VPC Service Controls, CMEK, and Private Service Connect
Vertex AI inherits Google Cloud's security model. IAM controls access to every Vertex AI resource — datasets, models, endpoints, pipelines — with predefined roles like roles/aiplatform.user and roles/aiplatform.admin. VPC Service Controls can place all Vertex AI APIs inside a security perimeter to prevent data exfiltration. Customer-Managed Encryption Keys (CMEK) encrypt model artifacts, training data, and predictions with keys controlled by the customer via Cloud KMS. Private Service Connect allows Vertex AI Prediction endpoints to be reached only over private IPs, never the public internet.
Compliance certifications and audit logging
For regulated industries, Vertex AI is HIPAA-eligible, supports PCI DSS, ISO 27001, SOC 1/2/3, and FedRAMP High. Audit logs flow into Cloud Logging, capturing every API call, every model deployment, and every prediction request when data-access logs are enabled. For more on cloud-native security baselines, the Google Cloud databases topic covers related governance principles.
Common Use Cases by Industry
Retail, financial services, and healthcare
Retail uses Vertex AI for demand forecasting (AutoML Tables), personalized recommendations (custom matrix-factorization models), and visual search (Vertex AI Vision + custom embeddings). Financial services use Vertex AI for fraud detection (custom XGBoost models with low-latency online prediction), credit-risk scoring, and document understanding (Document AI plus custom downstream classifiers). Healthcare uses Vertex AI for medical image classification (custom CNN models on de-identified images), clinical-trial matching, and Generative AI for patient-facing chatbots grounded in approved medical guidelines.
Manufacturing, media, and public sector
Manufacturing uses Vertex AI for predictive maintenance (time-series models on sensor data), defect detection on production lines (AutoML Vision with custom factory images), and supply-chain optimization. Media and entertainment use Generative AI on Vertex AI for content generation, automatic summarization, and personalized news feeds. Public sector uses Vertex AI with strict CMEK and VPC Service Controls for citizen-facing services, with full audit trails required by government oversight.
Frequently Asked Questions
Is Vertex AI the same as AI Platform?
No. AI Platform was the previous-generation Google Cloud ML product, focused on custom-model training and serving. In 2021 Google merged AI Platform with AutoML into a single unified platform called Vertex AI. New projects should use Vertex AI; AI Platform is deprecated. Every new ML capability — Pipelines, Feature Store, Model Garden, Generative AI Studio — has been launched inside Vertex AI rather than the legacy AI Platform.
When should I use Vertex AI AutoML versus Custom Training?
Use AutoML when the team lacks deep ML expertise, the data fits a supported type (tabular, image, text, video), and time-to-value matters more than squeezing the last 1% of accuracy. Use Custom Training when the team has data scientists who can write TensorFlow or PyTorch code, the problem requires a custom architecture (e.g., a graph neural network), or the model must be trained at very large scale across many GPUs or TPU pods.
How is Vertex AI different from BigQuery ML?
BigQuery ML lets analysts train models with SQL inside BigQuery, ideal when data already lives there and the team is SQL-fluent. Vertex AI offers the full ML lifecycle — Workbench notebooks, custom training, AutoML, Pipelines, Feature Store, Model Registry, online and batch prediction, and Generative AI Studio. The two are complementary: BigQuery ML for rapid prototyping by analysts; Vertex AI for production-grade MLOps.
What is the difference between Vertex AI Studio and Model Garden?
Model Garden is the catalog where you browse foundation models — Gemini, Claude, Llama, Gemma, Mistral, and open-source models from Hugging Face. Vertex AI Studio is the workspace where you actually work with those models: writing prompts, fine-tuning with custom data, generating embeddings, and building Retrieval-Augmented Generation (RAG) applications. Pick a model in Model Garden, then use Vertex AI Studio to prompt or tune it.
Does Vertex AI support models from Anthropic, Meta, and Mistral?
Yes. Vertex AI Model Garden is a multi-model catalog. Anthropic's Claude family (3.5 Sonnet, 3 Opus, 3 Haiku), Meta's Llama family, Mistral, and many other third-party and open-source models are available alongside Google's own Gemini family. This multi-model strategy lets customers pick the best model for each use case without leaving Google Cloud's security perimeter, governance stack, and unified billing.
Can Vertex AI run on-premises or only in Google Cloud?
Vertex AI is a fully managed Google Cloud service and runs in Google's regions. For hybrid scenarios, Vertex AI Prediction models can be exported and deployed to Google Distributed Cloud or to edge devices via Vertex AI Edge Manager workflows. The training and orchestration plane remains in Google Cloud, but inference can extend to on-premises or edge locations when latency or data-residency requires it.