Introduction to Governing AI with Model Monitoring
Governing AI with Model Monitoring on Google Cloud means treating every deployed model the way a regulated bank treats its trading desk: every input is logged, every prediction is auditable, every behavior change triggers a review. Vertex AI Model Monitoring, Model Registry, Model Garden controls, Safety filters, and ML Metadata lineage form a single fabric that turns experimental notebooks into accountable production systems. For the PDE exam, you need to know which signal each tool catches, what threshold to set, and which artifact proves compliance after a drift event.
白話文解釋(Plain English Explanation)
Think of Model Monitoring as a Restaurant Health Inspector
A new restaurant passes inspection on opening day. Six months later the freezer broke twice, the supplier swapped beef vendors, and the chef changed shifts. Nothing alone failed, but the food tastes different and customers complain. A health inspector who visits weekly catches the broken freezer log, the new supplier paperwork, and the shift change before anyone gets sick. Model Monitoring plays the inspector role: training-serving skew is the broken freezer, feature drift is the new supplier, prediction drift is the customer complaint trend, and the alert is the citation pinned to the kitchen door.
Think of Model Registry as an Airline Maintenance Logbook
Every commercial aircraft has a logbook listing each part installed, every inspection date, every pilot who flew it, and every incident report. Mechanics never service a plane without reading the log first. Vertex AI Model Registry is the same logbook for models. Each version records which dataset trained it, which pipeline produced it, which evaluation passed, and which endpoint serves traffic. When a regulator asks why your loan model denied an application in March, you open the logbook and trace the model version, the training data snapshot, and the feature lineage in minutes instead of weeks.
Think of Safety Filters as a Mailroom Screening Machine
A government mailroom runs every package through an X-ray and chemical sniffer before it reaches an executive's desk. Most mail passes through untouched, but the screener stops the suspicious envelope. Vertex AI Safety filters work the same way for generative AI: every prompt and every response passes through harassment, hate-speech, sexually-explicit, and dangerous-content classifiers. The model never sees the blocked content category in its raw output, the user never sees the unsafe completion, and the audit log records the block reason with a confidence score.
Think of Lineage Tracking as a Supply-Chain Barcode
A supermarket apple carries a barcode that traces back to the orchard, the picking date, the truck route, and the warehouse temperature log. If a salmonella outbreak hits, the recall covers the exact lots within hours. ML Metadata lineage gives the same barcode to every model artifact: this model came from that fine-tune job, which used this base model from Model Garden, which trained on this dataset version, which was filtered by this Dataflow job, which read from this BigQuery table snapshot. One contaminated feature, one full traceback.
Core Concepts of Governing AI with Model Monitoring
Vertex AI Model Monitoring watches three signal categories that map to three failure modes you must memorize for the PDE exam. Training-serving skew compares the statistical distribution of features sent to a deployed endpoint against the distribution of features in the training dataset baseline. The skew job runs once when you set the baseline, then continuously samples live prediction requests and computes a divergence metric per feature. Prediction drift compares the current production feature distribution against an earlier production window, usually the previous day or the rolling 7-day baseline. Drift catches gradual changes that skew misses because the training data is no longer the right comparison after the world has moved on. Feature attribution drift compares the importance ranking of features over time using Vertex Explainable AI. Even when raw feature distributions look stable, the model may start relying on different features to make decisions, which is an early warning of concept drift.
The divergence math matters less than the threshold defaults. For numerical features Vertex AI uses Jensen-Shannon divergence with a default alert threshold of 0.3. For categorical features it uses the L-infinity distance, also defaulted at 0.3. For attribution drift the default threshold is also 0.3 on the normalized attribution score. You can override per feature, which is essential when one feature is naturally noisy and another should never move.
Beyond the monitoring runtime, governance pulls in Model Registry for version control, Model Cards for documentation, ML Metadata for lineage graphs, and Vertex AI Experiments for A/B test bookkeeping. None of these tools work in isolation; they form a graph where every node references the next.
The statistical difference between the feature distribution used to train a model and the feature distribution observed at serving time. Skew indicates a pipeline mismatch, a feature engineering bug, or a sampling difference between training and production. See Vertex AI skew detection.
Vertex AI Model Monitoring default alert thresholds are all 0.3: Jensen-Shannon divergence for numerical features, L-infinity distance for categorical features, and the normalized attribution score for feature attribution drift. Sampling rates range from 10 percent (high-volume endpoints above 1000 RPS) to 100 percent (low-volume endpoints), and every alert is emitted as a Cloud Monitoring metric, not just an email.
Architecture and Design Patterns
A production-grade governance architecture on Vertex AI usually arranges seven layers. First, a data ingestion layer captures features and labels in BigQuery or Cloud Storage with snapshot timestamps. Second, a feature engineering layer built on Dataflow or BigQuery writes processed features to Vertex AI Feature Store, which becomes the single source of truth for both training reads and online serving reads. Third, a training pipeline built on Vertex AI Pipelines (Kubeflow) reads from Feature Store, trains the model, and registers the artifact in Model Registry with a version tag, a Model Card, and an evaluation slice report.
Fourth, an endpoint layer deploys the registered model to a Vertex AI Endpoint with traffic split rules so a new version receives 5 percent or 10 percent of traffic during canary testing. Fifth, a monitoring layer attaches a Model Monitoring job to the endpoint with three sub-jobs: skew, drift, and attribution drift. Sixth, an alerting layer routes Model Monitoring violations through Cloud Monitoring into PagerDuty, Slack, or an Eventarc trigger that calls a Cloud Function to gate a retraining pipeline. Seventh, an audit layer persists every prediction request and response into BigQuery via the request-response logging feature, where Looker Studio dashboards and BigQuery scheduled queries surface fairness metrics across protected groups.
For generative AI workloads the pattern shifts. Model Garden becomes the source of base models. Fine-tuning happens through a Vertex AI Tuning Job that records the parent model ID, the tuning dataset, and the hyperparameter set in ML Metadata. The fine-tuned model registers in Model Registry as a child node linked to the Garden parent. Safety filters wrap the deployed endpoint, and prompt-response logs flow into BigQuery with the safety attribute scores attached so you can audit blocks.
Model Monitoring on a Vertex AI Endpoint requires that you enable prediction request-response logging to BigQuery before the monitoring job can sample live traffic. Without logging enabled, the monitoring job has nothing to read and silently produces no signals. See request-response logging.
GCP Service Deep Dive
Vertex AI Model Monitoring
Model Monitoring runs as a managed scheduled job tied to one endpoint. You define the objective configuration with a baseline (training dataset URI for skew, a previous time window for drift), a sampling rate (typically 10 percent to 100 percent of prediction requests), and a per-feature threshold map. The job samples logged prediction requests from BigQuery, computes the divergence per feature, writes results to a monitoring statistics bucket, and emits an alert when any feature crosses its threshold. Monitoring v2 adds support for tabular, custom-container, and AutoML models, and lets you monitor against the Feature Store as the baseline rather than a static dataset URI, which keeps the baseline fresh as you retrain.
Vertex AI Model Registry
Model Registry stores model versions in a parent-child structure. A Model resource represents the logical model (for example loan-approval-v2), and each registered artifact becomes a Model Version under it. Versions carry labels, descriptions, evaluation results, deployment history, and a link to the producing Vertex AI Pipeline run. Aliasing lets you point a stable name like production or champion at a specific version, so canary deployments can flip the alias atomically once monitoring confirms the new version is healthy.
Model Garden
Model Garden is the catalog of first-party Google models (Gemini, Imagen, Chirp, Codey, Veo), partner models (Anthropic Claude, Meta Llama, Mistral), and open-source weights. Governance hooks include per-model usage policies, regional availability filters that enforce data residency, and an organization policy constraint (constraints/aiplatform.allowedModels) that lets a security admin restrict which Garden entries developers may deploy. Every Garden model deployment registers in Model Registry, so even base-model usage shows up in your inventory.
Vertex AI Safety Filters
Safety filters apply to Gemini and other Vertex AI generative endpoints. Four harm categories are monitored by default: harassment, hate speech, sexually explicit content, and dangerous content. Each category returns a probability score (NEGLIGIBLE, LOW, MEDIUM, HIGH) and a severity score on the response. You set a block threshold per category (BLOCK_NONE, BLOCK_ONLY_HIGH, BLOCK_MEDIUM_AND_ABOVE, BLOCK_LOW_AND_ABOVE), and the API short-circuits any response exceeding the threshold with a finishReason: SAFETY. For prompt safety, the same scoring applies on the input side so you can refuse processing before token generation begins.
Vertex AI Model Cards
A Model Card is a structured Markdown-and-YAML document attached to a Model Registry version. It captures the intended use, out-of-scope use, training dataset, evaluation slices, fairness considerations, ethical risks, and contact owners. The Model Card Toolkit auto-generates many sections from the training pipeline outputs, and you publish the rendered card to a documentation portal for stakeholder review. Auditors love Model Cards because they encode the answers to common compliance questions in a fixed schema.
ML Metadata and Lineage
Vertex ML Metadata records every artifact (datasets, models, metrics) and every execution (training jobs, evaluation jobs, deployments) as nodes in a graph, with edges marking input-output relationships. The lineage graph is queryable through the API and visualizable in the console. For a fine-tuned generative model, the graph traces from the Garden base model, through the tuning dataset, through the tuning job execution, to the resulting Model Registry version, then to the endpoint deployment, and finally to the monitoring job watching it.
Vertex AI Experiments
Experiments track A/B test results across model versions. Each experiment run records hyperparameters, metrics, model artifacts, and execution metadata. When you run a champion-challenger comparison, both versions log to the same experiment, and the comparison table renders side-by-side metrics. The experiment ID is also recorded in ML Metadata, closing the loop between A/B testing and lineage.
Use Vertex AI Feature Store as the monitoring baseline rather than a static GCS dataset whenever your features evolve. The baseline updates with each Feature Store snapshot, so you compare against current ground truth instead of a stale training file from last quarter. See Feature Store baselines.
Common Pitfalls and Trade-offs
Teams new to Model Monitoring almost always trip on the same problems. The first is silent baseline drift in the training dataset itself: someone replaces the training file in GCS without updating the monitoring job's baseline URI, and the monitoring job now compares production against a non-existent dataset version. Lock baseline URIs to immutable, versioned object paths and treat baseline updates as a deliberate change management step.
The second is threshold fatigue. Setting the default 0.3 threshold across every feature in a 200-feature model produces noise. Engineers start ignoring alerts within a week. The fix is feature-by-feature triage during onboarding: classify features into "must never drift" (PII, demographic), "expected to drift slowly" (counters, dates), and "noisy by nature" (sensor readings). Set tight thresholds on the first group, medium on the second, and loose on the third.
A third pitfall is logging cost shock. Request-response logging into BigQuery scales with prediction volume, and a 10K-request-per-second endpoint can write terabytes per day. Sample at 10 percent for high-volume endpoints, partition the BigQuery logging table by ingestion time, and set a 30-day expiration policy unless your compliance regime demands longer.
For generative AI, the dominant trap is assuming safety filters cover all your risk. The four default categories miss prompt injection, copyright leakage, and PII exfiltration. Layer Sensitive Data Protection (formerly DLP) on the prompt and response, plus a custom policy classifier for prompt-injection patterns, before relying solely on Vertex Safety filters.
A fourth pitfall is lineage gaps from manual steps. If a data scientist downloads a CSV, edits it locally, and re-uploads it as the training dataset, ML Metadata loses the link between the original BigQuery table and the trained model. Enforce that all training data flows through Dataflow or BigQuery jobs registered as Vertex AI Pipeline components, so every transformation appears in the lineage graph.
Do not deploy a tuned generative model to production without first running a safety eval suite against it. Fine-tuning can degrade the safety alignment of a base model, and the four default Vertex Safety categories may not catch the new failure modes introduced by your tuning data. See Vertex AI Safety overview.
Best Practices
- Lock the Model Monitoring baseline to a versioned GCS path or a Feature Store snapshot ID, never a mutable file location.
- Configure request-response logging to BigQuery before deploying the endpoint, with a partitioned table and a sampling rate appropriate to traffic volume.
- Tier your alert thresholds by feature criticality: tight on PII and demographics, medium on slow-moving features, loose on noisy sensor data.
- Attach a Model Card to every Model Registry version before deployment, and require Model Card sign-off as a release gate in your CI pipeline.
- Enforce the
constraints/aiplatform.allowedModelsorg policy to restrict which Model Garden entries may be deployed in production projects. - Wire monitoring alerts into Eventarc so threshold violations trigger a Vertex AI Pipeline retraining run automatically, with a human approval gate before the new version takes traffic.
- For generative endpoints, log prompt-response pairs with safety attribute scores into BigQuery, and run weekly fairness and bias scans across protected groups.
- Use Vertex AI Experiments for every A/B test, and link experiment runs to Model Registry versions so the lineage graph stays complete.
Real-World Use Case
A mid-sized European insurance company runs a claims fraud detection model serving 40K predictions per minute across home, auto, and travel insurance products. The model went live in 2023, retrained quarterly, and missed a coordinated fraud ring in Q2 2024 that cost roughly 8 million euros. The post-mortem revealed three governance gaps: monitoring was configured but no one watched the Slack channel, the training data hadn't been refreshed since Q4 2023, and the feature attribution had silently shifted toward a postal-code feature that the fraud ring exploited.
The remediation rolled out Vertex AI Model Monitoring with skew, drift, and attribution drift jobs on the endpoint, sampling 25 percent of requests. Each feature received a tiered threshold: postal code and policyholder age were locked at 0.15, claim amount at 0.25, and free-text incident descriptions at 0.4. Alerts route through Cloud Monitoring into PagerDuty with a 15-minute SLA for the on-call data engineer. A second Eventarc rule triggers a Vertex AI Pipeline retraining run when attribution drift exceeds 0.3 on any top-five feature, with the new model version landing in Model Registry tagged staging for human review before promotion to production.
The company also adopted Model Cards for every version, with a Responsible AI section listing fairness metrics across age bands, regions, and policy types. Audit reports for the national financial regulator now pull directly from BigQuery prediction logs joined with Model Registry version metadata. Six months in, monitoring caught a postal-code drift event within four hours of its emergence, and the automated retrain pipeline shipped a corrected model the same day. Annual fraud losses dropped 38 percent, and the regulator dropped the company's compliance audit frequency from quarterly to annual based on the lineage evidence.
Exam Tips
The PDE exam tests Governing AI with Model Monitoring through scenario questions that ask you to pick the right tool for a specific symptom. Memorize the symptom-to-tool mapping. "The model worked in dev but predictions look wrong in prod" points to training-serving skew detection. "Predictions are gradually getting worse over months" points to prediction drift. "The model still scores well but is making decisions for different reasons" points to feature attribution drift via Vertex Explainable AI. "We need to prove which dataset trained the version that denied this loan" points to Model Registry plus ML Metadata lineage. "A regulator is asking about fairness across demographic groups" points to Model Cards plus sliced evaluation in Vertex AI Experiments.
For generative AI scenarios, watch for the words "harassment," "hate speech," "sexually explicit," or "dangerous content," which point to Vertex AI Safety filters and the four default harm categories. The block threshold options (BLOCK_NONE through BLOCK_LOW_AND_ABOVE) are exam-relevant. When the question says "restrict which foundation models developers can deploy," the answer is the constraints/aiplatform.allowedModels org policy, not IAM. When the scenario mentions tracing from a tuned model back to its base, the answer is ML Metadata lineage in the Model Garden plus Model Registry graph.
Pay attention to baseline choice questions. Static GCS file as baseline is correct for one-off skew detection. Feature Store snapshot as baseline is correct for continuously evolving features. A previous time window as baseline is correct for prediction drift. Mixing them up is a classic distractor pattern.
Vertex AI Model Monitoring writes every alert to Cloud Monitoring as a metric, not just as an email. This means you can build SLO dashboards, define burn-rate alerts, and integrate with any incident management tool that already consumes Cloud Monitoring. The exam often distinguishes "where do the alerts go" with this answer over a generic "email notification." See Model Monitoring alerts.
Frequently Asked Questions
How is training-serving skew different from prediction drift in Vertex AI Model Monitoring?
Skew is a comparison between the live serving distribution and the original training dataset baseline; it tells you the production data no longer resembles the data the model learned from. Drift is a comparison between the current serving window and an earlier serving window; it tells you the production data is changing over time relative to itself, regardless of training data. A new pipeline bug typically shows up as skew within hours. A seasonal customer behavior change typically shows up as drift over weeks. The detection job structure is similar, but the baseline differs.
What does feature attribution drift catch that ordinary feature drift misses?
Ordinary feature drift watches the input distribution. Attribution drift watches which features the model relies on to make decisions, computed via Vertex Explainable AI integrated gradients or sampled Shapley values. A model can have stable input distributions yet shift its decision logic because of changing correlations among features. Attribution drift catches this concept-drift pattern early, often weeks before downstream business metrics degrade. It is the recommended monitoring signal for high-stakes models in finance, healthcare, and credit decisions.
How do Vertex AI Safety filters interact with fine-tuned generative models?
Safety filters apply to all Vertex AI generative endpoints regardless of whether the model is a Garden base model or a tuned variant. The four default harm categories (harassment, hate speech, sexually explicit, dangerous content) score every prompt and every response. However, fine-tuning can shift a model's safety behavior in unpredictable ways, so the recommended pattern is to run a safety eval suite against the tuned model before production deployment, then optionally tighten the block thresholds for the tuned endpoint. Safety scores are returned with every API response, so you can log and audit them even when no block is triggered.
What is recorded in Vertex AI Model Registry that is not stored elsewhere?
Model Registry uniquely records the Model Version metadata: version ID, alias mappings, deployment history across endpoints, evaluation summaries, lineage links to the producing pipeline, and the Model Card payload. The model artifact bytes live in GCS, the lineage graph lives in ML Metadata, and the prediction logs live in BigQuery, but Model Registry is the canonical join point that lets a single API call resolve "which version is currently serving the production alias and what is its full provenance."
How do I prove fine-tune lineage to an auditor on Google Cloud?
Open ML Metadata for the deployed model version and traverse the lineage graph upstream. The graph shows the Model Garden base model node, the tuning dataset artifact node, the tuning job execution node with its hyperparameters, and any preprocessing pipeline executions that produced the dataset. Export the graph as JSON via the API or render it visually in the Vertex AI console, then attach it to the audit response along with the Model Card and the evaluation slice report. This combination of artifacts answers virtually every regulator question about how a generative model came to make a specific decision.
Can I run Model Monitoring on a model deployed outside Vertex AI?
The managed Model Monitoring service is designed for Vertex AI Endpoints and Vertex AI Batch Prediction jobs. For models deployed on GKE, Cloud Run, or external infrastructure, you can replicate the pattern by logging prediction requests to BigQuery, running scheduled BigQuery SQL queries that compute Jensen-Shannon divergence against a baseline table, and triggering alerts via Cloud Monitoring custom metrics. The math is the same; you just lose the managed scheduling and the integrated UI. For PDE exam scenarios, assume the answer is Vertex AI Endpoints unless the scenario explicitly rules them out.
What is the recommended sampling rate for Model Monitoring on a high-traffic endpoint?
For endpoints serving more than 1000 requests per second, start at 10 percent sampling and confirm that the resulting BigQuery log volume is manageable and the divergence statistics are stable across runs. For lower-traffic endpoints, sample at 100 percent because the storage cost is negligible and the smaller sample size benefits from full coverage. Adjust the sampling rate after observing the variance of the divergence metric across a few weeks of monitoring runs; if the metric is stable at 10 percent, there is no benefit to going higher.
Related Topics
- Vertex AI Pipelines and MLOps covers the orchestration layer that produces the model versions registered in Model Registry and triggers the retraining runs that monitoring alerts initiate.
- Vertex AI Feature Store Design explains how Feature Store snapshots become the moving baseline for skew detection and the single source of truth for online and offline feature reads.
- Data Sovereignty and Compliance Design covers the regional and residency controls that govern where Model Garden models may run and where prediction logs may be stored.
Further Reading
- Vertex AI Model Monitoring overview — official documentation covering skew, drift, and attribution detection mechanics, threshold defaults, and configuration examples.
- Configure safety filters in Vertex AI — the canonical reference for the four harm categories, block threshold options, and per-request safety attribute scoring.
- Vertex AI Model Registry introduction — how to register, version, alias, and deploy models with full lineage and evaluation tracking.
- Vertex AI ML Metadata tracking — the lineage graph API, artifact and execution schemas, and queries for audit reporting.