Chapter PDE professional tier
Professional Data Engineer
Editor's note — A study companion for the Professional Data Engineer exam — every domain rebuilt from scratch, with worked practice questions and an exam-grade timed simulation.
50 questions 120 minutes threshold 700/1000 5 domains official guide
Table of Contents
I. Design Data Processing Systems 22% weight
Batch and Streaming Pipeline Design Dataflow Architecture and Selection Dataproc and Hadoop/Spark Modernization Cloud Storage Data Lake Architecture Cloud Bigtable Schema and Performance Design Cloud Spanner Architecture for Data Engineering BigQuery Data Modeling and Performance Data Storage Security and IAM Data Sovereignty and Compliance PII Identification and De-identification Disaster Recovery for Data Platforms Cost Optimization for Data Processing II. Ingest Process Data 25% weight
Pub/Sub Messaging and Ingestion Datastream for Change Data Capture (CDC) Large Scale Data Transfer Apache Beam Programming Model Advanced Dataflow: Windowing and Watermarks Dataflow Prime and Pipeline Optimization Processing Data with Dataproc and Spark Dataform SQL Workflow Development Cloud Dataprep and Data Cleaning BigQuery Advanced SQL and UDFs Handling Duplicate and Corrupt Data III. Store Data 20% weight
BigQuery Editions and Capacity Management BigQuery Storage and Performance Optimization BigLake: Unified Storage and Security BigQuery Omni Multi-cloud Analytics Dataplex: Data Governance and Catalog Implementing Data Mesh with Dataplex Cloud Bigtable Performance and Management Cloud Spanner Query Optimization Analytics Hub and Data Sharing Cloud SQL and AlloyDB for Analytical Workloads IV. Prepare Use Data Analysis 15% weight
BigQuery BI Engine and Acceleration Looker and Looker Studio Integration BigQuery ML: Model Training BigQuery ML: Inference and Evaluation Vertex AI Pipelines and MLOps Vertex AI Feature Store Vector Search and Embeddings for GenAI Data Prep for LLMs and RAG Exploring Data with BigQuery Studio Governing AI with Vertex AI Model Monitoring V. Maintain Automate Data Workloads 18% weight
Monitoring and Logging for Data Pipelines Troubleshooting Data Processing Jobs Monitoring BigQuery Performance and Cost Orchestration with Cloud Composer Dataform CI/CD and Pipeline Automation Cloud Workflows for Data Operations Testing and Validation in Data Pipelines Performance Profiling and Tuning Managing Data Lifecycle Policies Data Lineage and Impact Analysis