Lambda and EventBridge for Event-Driven Ingestion - DEA-C01 Data Engineer Study Notes

Q: Q1 — When should I use Lambda vs Glue for data processing?

Use Lambda when the workload is short (under 15 minutes), small payload (under 6 MB synchronous), event-driven, and stateless. Lambda excels at lightweight transformations, routing, validation, and orchestration glue between services. Use Glue for any batch ETL exceeding Lambda's limits — multi-GB to TB-scale processing, multi-hour runtimes, distributed Spark compute, schema-aware transforms, and integration with the Glue Data Catalog. The clearest rule: if the task is "react to one event, do a small transform, write to a destination," it is Lambda. If the task is "run a daily ETL on yesterday's data," it is Glue. The hybrid pattern is common: S3 event triggers Lambda, Lambda validates and invokes a Glue job, Glue handles the heavy ETL, Step Functions or destinations route success or failure. Mistaking Lambda for batch ETL is the single most common DEA-C01 Domain 1 trap.

Q: Q2 — What is the difference between EventBridge Rules and EventBridge Pipes?

EventBridge Rules match events on the event bus against JSON patterns and route matching events to one or more targets — fan-out is the core capability. Rules are used when the source is the bus itself (AWS service events, custom application events via PutEvents) and you need flexible pattern matching plus multi-target routing. EventBridge Pipes is point-to-point: one source (Kinesis, DynamoDB Streams, SQS, MSK, self-managed Kafka, Amazon MQ) connects to one target with optional filter and enrichment in between. Pipes is used when the source is a stream or queue and you want managed point-to-point integration without writing a Lambda just to filter and forward. The decision is shape-based: fan-out from a bus → Rules; one stream/queue to one target with optional transform → Pipes. They complement each other in real architectures — Pipes can target an EventBridge bus, where Rules then fan out further.

Q: Q3 — How do I tune the Lambda Kinesis trigger for high throughput?

Three parameters drive throughput. Batch size (default 100, max 10000) controls records per invocation — increase to reduce per-invocation overhead, decrease if Lambda OOMs or times out per batch. Batch window (default 0, max 300 seconds) controls how long the mapping waits to accumulate a batch before invoking — increase for low-volume shards to avoid wasteful tiny invocations, leave at 0 for high-volume shards. Parallelization factor (1-10) controls concurrent invocations per shard — when set to N, the mapping splits records by partition key into N parallel sub-streams, processing different keys in parallel while preserving order within each partition key. Use parallelization factor when shard processing is CPU-bound and a single Lambda cannot keep up with the shard's 2 MB/s read budget. Monitor IteratorAge (records pending) and IncomingRecords / GetRecords.Bytes to diagnose lag. Increasing function memory also increases vCPU which increases per-record processing throughput. The exam plants "Lambda lagging on Kinesis with high IteratorAge" with parallelization factor as the canonical fix.

Q: Q4 — When should I use EventBridge Scheduler vs CloudWatch Events scheduled rules?

Use EventBridge Scheduler for any new scheduled invocation. It scales to millions of schedules per account, supports flexible time windows, integrates directly with 270+ AWS services without intermediate Lambdas, supports per-schedule retry and dead-letter queue configuration, and offers cleaner one-time scheduling at future timestamps. CloudWatch Events scheduled rules are the legacy mechanism with hard limits (300 rules per region per account), shallow target integration (mainly Lambda and Step Functions), and no per-rule customization of retry behavior. The DEA-C01 exam expects you to recognize Scheduler as the modern answer for "schedule a daily ETL job" or "schedule a Lambda every 15 minutes." CloudWatch Events scheduled rules remain supported for backward compatibility with existing infrastructure but are not the recommended default for new designs.

Q: Q6 — How do I handle Lambda failures in an event-driven pipeline?

Three layers of failure handling. First, automatic retry — for stream sources (Kinesis, DynamoDB Streams), the mapping retries the batch until success or the configured max retries (default unlimited until records expire). For asynchronous invocations (S3 events, EventBridge), Lambda retries twice automatically. For SQS, the queue's redrive policy controls retries. Second, bisect on function error for Kinesis and DynamoDB Streams — on retry, split the failed batch in half and retry each half, isolating the poison-pill record. Third, on-failure destinations or dead-letter queues — when retries are exhausted, send the failed event to an SQS queue, SNS topic, EventBridge bus, or Lambda for later analysis or replay. The modern destination feature includes more metadata than legacy DLQs and is the recommended choice. Always configure a destination or DLQ for production event-driven pipelines — without one, exhausted retries silently drop events.

Q: Q7 — What is the right pattern for filtering S3 events before they reach my Lambda?

S3 event notification's native filtering is limited to prefix and suffix on the object key — incoming/ prefix or .parquet suffix. For richer filtering (object size, content type, tags, metadata), route through EventBridge: configure S3 to publish to EventBridge instead of directly to Lambda, then write an EventBridge Rule with a pattern matching the desired conditions, with Lambda as the rule target. EventBridge Rules can match on any field in the event detail, including object size, e-tag, and any other metadata in the S3 event. The benefit is significant: Lambda only invokes for events that match the full filter, saving invocation cost and simplifying the function code. The DEA-C01 exam plants this as "we want Lambda to process only Parquet files larger than 100 MB" — EventBridge with a rule pattern is the canonical answer, not a Lambda that filters and returns early.

Event-driven ingestion is the architectural style where data flows in response to events — a file lands in S3, a row is written to DynamoDB, a record is appended to a Kinesis stream — and downstream pipelines react automatically. AWS Lambda and Amazon EventBridge are the two services that anchor this style on AWS, and they sit squarely in DEA-C01 Domain 1 Tasks 1.1 (perform data ingestion), 1.3 (orchestrate data pipelines), and 1.4 (apply programming concepts). Community study guides from Tutorials Dojo, Digital Cloud Training, and ExamCert.App all flag the same pain points — candidates pick Lambda for long-running batch ETL (it cannot do that), conflate EventBridge Rules with EventBridge Pipes, misconfigure the Kinesis trigger batch size, and overlook EventBridge Scheduler as a successor to CloudWatch Events scheduled rules.

This guide is built for the data engineer perspective. It covers what event-driven ingestion architecture looks like, the Lambda execution model and its hard limits, every Lambda event source that matters for data engineering, EventBridge Rules vs Pipes vs Scheduler, S3 event notifications, DynamoDB Streams as a CDC source, the Lambda Kinesis trigger configuration including batch size and parallelization factor, the Lambda vs Glue vs Step Functions decision matrix, and the canonical exam traps that catch most data engineers. By the end, the event-driven ingestion surface should feel as natural as wiring up a smart home where each sensor triggers the right appliance.

What Is Event-Driven Ingestion?

Event-driven ingestion is the architectural pattern where producers emit events as they happen, an event bus or stream routes those events to consumers, and consumers process events asynchronously without coordinating with the producer. The DEA-C01 exam contrasts this with scheduled batch ingestion (a Glue job runs at midnight pulling yesterday's data) and request-response ingestion (a client calls an API and waits). Event-driven ingestion is what powers near-real-time data pipelines on AWS — files land in S3, the event triggers a Lambda that catalogs and routes them; Kinesis records arrive, a Lambda consumer aggregates and writes to DynamoDB; a database row changes, DynamoDB Streams fires a Lambda that propagates to the data lake.

The Core Building Blocks

The five canonical pieces of an event-driven ingestion pipeline are: producers that emit events (S3, DynamoDB, application code, partner SaaS), event brokers that route or buffer events (EventBridge, Kinesis, MSK, SNS, SQS), filters that drop irrelevant events early, consumers that process events (Lambda, Glue, Step Functions, downstream services), and destinations that capture results or failures (S3, Glue catalog, DynamoDB, dead-letter queues). The DEA-C01 exam tests selection of the right service at each layer based on volume, latency, ordering, durability, and cost requirements.

Push vs Pull Models

Push-model ingestion has the broker push events to consumers as they arrive — Lambda invocations from S3 events, EventBridge target invocations, SNS to Lambda. Pull-model ingestion has consumers fetch events from the broker — Kinesis Data Streams consumers polling shards, MSK consumers polling Kafka topics. Lambda event source mappings hide the polling behind a managed abstraction so the data engineer writes consumer logic only. The exam plants this as "Lambda is push-only" — wrong; for stream sources Lambda runs a managed poller behind the event source mapping.

Plain-Language Explanation: Event-Driven Ingestion

Event-driven architecture is the kind of system where naming alone does not convey the trade-offs. Three concrete analogies make the structure stick.

Analogy 1 — The Smart Home With Sensors And Appliances

Picture a smart home. The doorbell sensor detects a visitor and emits an event. The smart hub routes the event — turning on porch lights, sending a phone notification, starting a video recording. Each appliance reacts independently without the doorbell knowing what is downstream. The doorbell is an event producer (like S3 or DynamoDB Streams). The smart hub is the event broker (like EventBridge). The rules in the hub ("if doorbell rings between 11pm and 6am, also wake the security system") are EventBridge Rules with pattern matching and target routing. The camera that records on demand is Lambda — it wakes up only when invoked, runs for a few seconds, and goes quiet. The timer that brews coffee at 7am every weekday is EventBridge Scheduler — a recurring schedule decoupled from sensor events.

When you remodel and add a security camera, the doorbell does not need reconfiguration — you just add a new rule in the hub routing doorbell events to the new camera. That decoupling is what event-driven ingestion buys: producers and consumers evolve independently. The dead-letter queue is the error log the hub keeps when an appliance is offline — you can replay events later. The Lambda 15-minute timeout is the same as the camera battery life — it can record short clips reliably, but you cannot use it to film a two-hour security stakeout.

Analogy 2 — The Newsroom With Reporters, Editors, And Wires

Picture a newsroom. Reporters in the field file stories as events happen (breaking news, sports scores, weather alerts) — they are event producers. The wire service routes filed stories to interested editors based on topic and region — that is EventBridge Rules with pattern matching. The editor's desk picks up wire stories, edits them, and publishes — that editor is Lambda, briefly active per story, then idle. The duplication checker that compares the new story to recent ones and drops repeats is EventBridge Pipes filtering. The morning briefing scheduled at 6am every day is EventBridge Scheduler — runs on a clock independent of breaking news.

The newsroom has rules: an editor can spend at most 15 minutes on a single story before the desk reassigns it (Lambda 15-minute timeout), the editor's draft cannot exceed 6 MB (Lambda payload limit), and only ten editors can work simultaneously (concurrency limit). Stories that the editor cannot finish in time go to the rewrite queue (the dead-letter queue) for follow-up. When a story breaks at scale (a sudden surge of events from one wire), the newsroom can reserve dedicated editors who never get diverted to other tasks (reserved concurrency) or have warm-up editors always on standby (provisioned concurrency).

Analogy 3 — The Postal Sorting Center With Conveyor Belts

Picture a postal sorting center. Mail bags arrive from regional offices (events from S3 or DynamoDB). The central conveyor belt routes each parcel by destination region (EventBridge bus). Sorting machines at each branch (Lambda functions, Glue jobs, Step Functions) pick up parcels, do their work, and pass them downstream. The conveyor speed limiter is the batch size — a sorting machine handles up to 100 parcels per cycle. The parallel sorters per branch are the parallelization factor in the Lambda Kinesis trigger — for a single shard, run up to 10 parallel sorting machines processing the parcels with the same partition key in order.

The sorting center has buffering everywhere. Kinesis Data Streams is the buffered conveyor that holds 1-365 days of parcels. SQS is the holding bin that queues parcels when a sorting machine is busy. DynamoDB Streams is the change log of the sorting center's address book — every change emits an event. S3 event notifications are the package-arrival announcements at the loading dock. The dead-letter queue is the lost-and-found for parcels the machines could not sort. Misconfigure the batch size and machines either drown in too many parcels per cycle or starve waiting for the next batch — the canonical Lambda Kinesis trigger tuning exercise.

AWS Lambda — The Event Processor

Lambda is the workhorse compute service for event-driven ingestion.

Lambda Execution Model

Lambda runs your code in response to an invocation, scales to thousands of concurrent executions automatically, bills per millisecond of compute, and shuts down when idle. The unit of execution is a function — a piece of code, a runtime (Python, Node.js, Java, Go, Ruby, custom), an IAM role, environment variables, and configuration like memory and timeout. Each invocation gets its own execution environment that may be reused across invocations for warm starts but is not shared concurrently.

The Hard Limits That Matter For Data Engineering

Lambda's limits are exam material because they directly determine when Lambda is the wrong choice. Maximum execution time is 15 minutes — any task longer than that is wrong for Lambda. Maximum memory is 10 GB with vCPU scaling proportional to memory. Maximum payload is 6 MB synchronous, 256 KB asynchronous, 6 MB for invocations through API Gateway. Ephemeral storage is up to 10 GB in /tmp. Concurrent execution limit is per-account-per-region with a default of 1000, raisable on request. Deployment package is up to 250 MB unzipped, 50 MB zipped, with container images up to 10 GB. The exam plants "use Lambda to process a 50 GB file" — wrong, exceeds memory and timeout. "Use Lambda to ETL a daily 500 GB Parquet aggregation" — wrong, use Glue.

Concurrency Models — Reserved And Provisioned

Reserved concurrency caps the maximum concurrent executions of a function and reserves that capacity from the account pool — guarantees the function gets the capacity but also caps it. Provisioned concurrency pre-warms execution environments so invocations skip cold starts — adds cost (per-GB-second whether invoked or not) but eliminates cold-start latency for latency-sensitive use cases. Standard (unreserved) concurrency draws from the shared account pool with no guarantee. The exam plants "consistent low-latency processing of bursty events" — provisioned concurrency.

Lambda Destinations And DLQs

Destinations route the result of an asynchronous invocation to another service — successful results to one destination (SNS, SQS, EventBridge, Lambda), failed results to another. Dead-letter queues capture events that failed all retries — typically an SQS queue or SNS topic. Destinations are richer (more metadata, finer routing) and are the modern replacement for DLQs, but DLQs are still supported for backward compatibility.

A Lambda event source mapping is a managed poller that reads events from a stream or queue source (Kinesis, DynamoDB Streams, SQS, MSK, self-managed Kafka, Amazon MQ) and invokes the function with batches of events. The mapping handles iterator management, batching by record count and bytes, partial-batch failure reporting, parallelization within shards, and retry with backoff — all without your code seeing any of it. Configuration parameters include batch size (records per invocation, default 100, max 10000 for Kinesis), batch window (max time to accumulate a batch), starting position (TRIM_HORIZON, LATEST, AT_TIMESTAMP), parallelization factor (1-10 concurrent invocations per shard), bisect on function error (split failed batches in half on retry), and on-failure destination (where to send permanently failed batches). The exam tests these parameters because misconfiguring them is the most common Lambda Kinesis bug in production.

Lambda Event Sources For Data Engineering

The DEA-C01 exam expects you to know every event source that matters for data ingestion.

S3 Event Notifications

S3 publishes events on object creation (s3:ObjectCreated:*), deletion, restore, replication, and lifecycle transitions. Event notifications can target Lambda, SQS, SNS, or EventBridge (the recommended modern path). For a file landing in S3, an event-driven ingestion pipeline configures S3 to notify Lambda — the function reads the new file, parses it, and writes processed output downstream. EventBridge as the target gives the richest filtering and routing — multiple rules can subscribe to the same S3 event without S3 needing per-target configuration.

Amazon Kinesis Data Streams

Lambda integrates with Kinesis Data Streams via an event source mapping that polls each shard. Records are delivered in batches with a configurable batch size (default 100, max 10000). The function processes the batch and the mapping advances the iterator on success. Order is preserved within a shard — records with the same partition key always go to the same shard and are processed in order. The exam loves Kinesis-Lambda configuration questions because mistuning the batch size or parallelization factor is so common in production.

DynamoDB Streams

DynamoDB Streams emit a change record for every insert, update, and delete on a table. Lambda consumes these via an event source mapping just like Kinesis. The use case is change data capture — propagating DynamoDB changes to a data lake (S3 + Glue catalog), to OpenSearch for search indexing, or to another DynamoDB table for cross-region replication. Stream view types control what data is in each record: keys only, new image, old image, or new and old images. The exam plants "stream all changes from a DynamoDB table to a data lake" — DynamoDB Streams + Lambda + S3.

Amazon SQS

Lambda polls SQS queues via event source mapping. Standard queues deliver messages in best-effort order with at-least-once delivery; FIFO queues deliver in strict order with exactly-once processing per message group. The Lambda mapping configurable batch size (1-10000 for standard, 1-10 for FIFO), batch window, and reports partial batch failures so successfully processed messages are not redelivered. Common pattern: a producer writes to SQS, Lambda processes batches and writes to S3 or DynamoDB.

Amazon MSK And Self-Managed Kafka

Lambda integrates with MSK and self-managed Kafka via event source mappings. The mapping handles consumer group management, offset commits, and partition assignment. Use cases mirror Kinesis but for Kafka-ecosystem pipelines. The exam plants this as "MSK already in place, want event-driven Lambda processing" — answer is the MSK event source mapping, not a custom Kafka consumer.

Amazon SNS

Lambda subscribes to SNS topics for fan-out patterns — one published message invokes many subscribed functions. SNS to Lambda is asynchronous (no batching by default, one message per invocation) — for batched processing prefer SNS to SQS to Lambda which gives you SQS batching semantics.

The Lambda Kinesis event source mapping has three tuning parameters that determine throughput and ordering — batch size, batch window, and parallelization factor — and getting them right is critical for production performance. Batch size (default 100, max 10000) controls records per invocation; larger batches mean fewer invocations and lower per-record overhead but higher per-invocation memory. Batch window (default 0, max 5 minutes) controls the maximum time the mapping waits to accumulate a batch — useful for low-volume shards to avoid wasteful tiny invocations. Parallelization factor (1-10) controls concurrent invocations per shard — when set to N, the mapping splits records by partition key into N parallel sub-streams, preserving order within each partition key but processing different keys in parallel. Use parallelization factor when shard processing is CPU-bound and a single Lambda cannot keep up with the shard's 2 MB/s read budget. The exam plants "Lambda is falling behind on a high-volume Kinesis stream" — answer is increase parallelization factor (and check IteratorAge metric).

Amazon EventBridge — The Event Bus

EventBridge is the AWS-native serverless event bus.

EventBridge Rules — Pattern Matching And Routing

An EventBridge rule matches events against a JSON pattern and routes matching events to one or more targets. Patterns can match on event source, detail-type, account, region, or any field in the event detail object. Targets include Lambda, Step Functions, SQS, SNS, Kinesis, ECS tasks, API Gateway, and dozens more. A single event can match multiple rules and fan out to multiple targets — this is how EventBridge enables loose coupling between event producers and consumers.

EventBridge Buses — Default, Custom, Partner

Every account has a default event bus that receives AWS service events. Custom event buses receive events from your own application code published via PutEvents. Partner event buses receive events from SaaS partners (Salesforce, Zendesk, Datadog) without intermediate integration code. The DEA-C01 exam plants this as "ingest events from Salesforce" — partner event bus or AppFlow, not a custom integration.

EventBridge Pipes — Point-To-Point With Filter And Enrichment

EventBridge Pipes is a point-to-point integration primitive that connects one source to one target with optional filtering and enrichment in between. Sources include DynamoDB Streams, Kinesis, SQS, MSK, and self-managed Kafka. Filtering happens before invocation cost so you do not pay to process events you do not want. Enrichment calls Lambda or Step Functions to transform the event before it reaches the target. Targets include Lambda, Step Functions, SNS, SQS, Kinesis, EventBridge buses, API destinations, and more.

Pipes vs Rules — When To Use Each

Use Rules when one event needs to fan out to multiple targets, when the source is the EventBridge bus itself, or when you need rich pattern matching across event attributes. Use Pipes when the source is a stream or queue and you want point-to-point integration with filter and enrichment, when you want to avoid running a separate Lambda just for filtering, or when the source is not natively integrated with EventBridge bus. The exam plants "filter Kinesis events before Lambda processing" — Pipes with filter, not Lambda with an early return.

EventBridge Scheduler — One-Time And Recurring

EventBridge Scheduler is a separate (newer) service for scheduled invocations, replacing CloudWatch Events scheduled rules. It supports one-time schedules at a future timestamp, recurring schedules on cron or rate expressions, and time-windowed schedules with flexible time windows. Targets include 270+ AWS services directly. The advantage over CloudWatch Events scheduled rules is much higher scale (millions of schedules), per-schedule customization, and a cleaner API. The DEA-C01 exam expects you to know Scheduler is the modern way to schedule a Glue job, a Lambda, or a Step Functions execution at fixed times.

Use EventBridge Scheduler instead of CloudWatch Events scheduled rules for any new scheduled invocation — it scales to millions of schedules, supports flexible time windows, and integrates directly with 270+ AWS services without intermediate Lambdas. CloudWatch Events scheduled rules are the legacy scheduling mechanism with limits (300 rules per region per account) and shallow target integration. EventBridge Scheduler removes those limits and adds first-class one-time scheduling, retry policies per schedule, dead-letter queues per schedule, and IAM role per schedule. For data engineering, this is the right answer for scheduling daily Glue jobs, hourly Lambda invocations, weekly Redshift COPY commands, and any other recurring data pipeline trigger. The DEA-C01 exam plants "schedule a daily ETL job" — Scheduler if it is a new design, scheduled CloudWatch Events rule only if the question implies legacy infrastructure.

DynamoDB Streams As CDC — The Canonical Data Pipeline

DynamoDB Streams is the canonical CDC pattern on AWS.

What CDC Means

Change Data Capture (CDC) is the practice of streaming every change to a source data store as events to downstream consumers. Insert, update, and delete each emit one event. The downstream pipeline reconstructs the source state, indexes it for search, archives it for analytics, or replicates it to other regions or accounts.

DynamoDB Streams Configuration

Enable Streams on a DynamoDB table at creation or via update. Choose a stream view type: KEYS_ONLY (just the changed keys), NEW_IMAGE (the post-change item), OLD_IMAGE (the pre-change item), or NEW_AND_OLD_IMAGES (both, suitable for diff-based downstream processing). The view type cannot be changed later without disabling and re-enabling the stream.

Stream Retention

DynamoDB Streams retain records for 24 hours. Consumers must read within that window or lose records. For longer retention, configure Kinesis Data Streams for DynamoDB which exports change events to a Kinesis Data Stream with up to 365-day retention.

Common CDC Pipeline Patterns

The standard pattern: DynamoDB Streams → Lambda → S3 (Parquet) for data lake ingestion. For real-time search: DynamoDB Streams → Lambda → OpenSearch. For cross-region replication beyond Global Tables: DynamoDB Streams → Lambda → DynamoDB target table. For audit logs: DynamoDB Streams → EventBridge Pipes → S3.

S3 Event Notifications — File Landing Triggers

S3 event notifications are the most common entry point for data lake ingestion.

Notification Targets

S3 publishes events to Lambda, SQS, SNS, or EventBridge. EventBridge is the recommended modern target because it supports multiple rules subscribing to the same event without S3 reconfiguration. Lambda is direct and simple; SQS buffers events for slower consumers; SNS fans out to multiple subscribers.

Event Types

The most common event types are s3:ObjectCreated:Put, s3:ObjectCreated:Post, s3:ObjectCreated:CompleteMultipartUpload, s3:ObjectRemoved:Delete, and s3:ObjectRestore:Completed (for Glacier restores). Filter by prefix and suffix to limit notifications to relevant objects — for example, only incoming/ prefix and .parquet suffix.

S3 Event Filtering Limitations

S3 event notification filtering is limited to prefix and suffix on the object key. For richer filtering (object size, metadata, tags), route through EventBridge and use EventBridge Rule patterns which can match on any field. The exam plants "filter S3 events by object size" — EventBridge target plus rule pattern, not native S3 notification filter.

Common Patterns — Trigger Glue, Catalog Updates

A canonical data lake pattern: file lands in S3, S3 event triggers Lambda, Lambda either runs lightweight processing inline or invokes a Glue ETL job for heavy work. For catalog hygiene, S3 events trigger a Lambda that updates Glue partitions when new prefixes appear, avoiding full crawler runs.

Lambda vs Glue vs Step Functions — Service Selection

The DEA-C01 exam regularly tests which compute service to pick.

Lambda — Short, Event-Driven Processing

Use Lambda when the task fits in 15 minutes, 10 GB memory, and 6 MB payload. Examples: route a new S3 file to the right Glue job, transform a small DynamoDB Streams batch, validate an API webhook payload before storing, send a metric on data arrival. Avoid Lambda for batch ETL, large file processing, or long-running aggregations.

Glue — Batch ETL At Scale

Use Glue ETL for batch transformations that exceed Lambda's limits. Glue handles tens to hundreds of GB to TB-scale data, runs Spark distributed compute, integrates with the Glue Data Catalog, and bills per DPU-second. Glue is wrong for very short event-driven tasks (job startup overhead is several minutes).

Step Functions — Orchestration Of Multi-Service Workflows

Use Step Functions when the pipeline coordinates multiple services — a workflow that triggers a Glue job, waits for completion, runs a Lambda for validation, calls Redshift COPY, and notifies on success. Step Functions handle retries, error catching, parallel branches, and human-in-the-loop steps. Avoid Step Functions when a single Lambda or single Glue job suffices.

The Decision Matrix

The exam plants service selection scenarios constantly. Apply the matrix: short and event-driven → Lambda. Long batch ETL → Glue. Multi-step coordination → Step Functions. All three together for a complex pipeline → S3 event → Lambda router → Step Functions → Glue ETL with success/failure notifications via Lambda destinations. Knowing this matrix cold answers most Domain 1 service-selection questions.

Lambda is not suitable for long-running batch ETL — its 15-minute timeout, 10 GB memory cap, and 6 MB payload limit make it the wrong tool for any task involving multi-GB files, multi-hour runs, or distributed compute. The exam plants this as "use Lambda to process a 50 GB daily Parquet file" or "use Lambda to ETL one hour of Kinesis data into Redshift" — both wrong. The right answer for batch ETL is Glue (managed Spark) or EMR (custom Spark). Lambda's role in event-driven ingestion is the lightweight router and validator: it receives an event, decides what to do, and either processes a small payload itself or invokes a heavier service like Glue or Step Functions to do the real work. Mistaking Lambda for a general-purpose compute service is the single most common Domain 1 scoring trap.

Common Exam Traps For Lambda And EventBridge

The DEA-C01 exam plants a consistent set of traps. Memorize all five.

Trap 1 — Lambda For Batch ETL

The most cited trap. Lambda's 15-minute timeout disqualifies it for any batch over a few minutes. Glue or Step Functions plus Glue is the right answer.

Trap 2 — EventBridge Rules vs Pipes Confusion

Rules fan out events to multiple targets with rich pattern matching. Pipes is point-to-point with filter and enrichment. The exam plants "filter Kinesis records before Lambda" expecting Pipes; "fan out an event to five different processors" expecting Rules.

Trap 3 — S3 Event Notification Filter Limitations

S3 events can filter only on prefix and suffix. Filtering by object size, metadata, or tags requires routing through EventBridge with rule patterns.

Trap 4 — Kinesis Trigger Batch Size Mistuning

Setting batch size too small starves Lambda and ramps invocation costs. Setting too large causes per-invocation OOM or timeouts. The IteratorAge metric in CloudWatch tells you when batch size or parallelization factor needs tuning.

Trap 5 — DynamoDB Stream View Type Picked Wrong

Picking KEYS_ONLY when the consumer needs the new item value forces an extra GetItem call per event, doubling read cost. Pick NEW_IMAGE for most CDC use cases; NEW_AND_OLD_IMAGES when downstream needs to compute deltas.

Trap 6 — DynamoDB Streams Retention Underestimated

Streams retain only 24 hours. If your consumer is paused or backlogged longer, records are permanently lost. Use Kinesis Data Streams for DynamoDB for longer retention up to 365 days.

Trap 7 — Forgetting EventBridge Scheduler For Schedules

A candidate proposes CloudWatch Events scheduled rules for a new scheduled job. The modern answer is EventBridge Scheduler — higher scale, better integration, more features.

Lambda + EventBridge for event-driven ingestion: Lambda for short event processing under 15 minutes, EventBridge Rules for pattern-matched fan-out, EventBridge Pipes for point-to-point with filter and enrich, EventBridge Scheduler for cron-style schedules, S3 events for object-level triggers, DynamoDB Streams for CDC, Kinesis trigger for stream consumption. This is the one sentence to memorize. If the scenario word is "short event processing" or "lightweight transform," answer Lambda. If "fan out to multiple consumers," answer EventBridge Rules. If "filter stream events before processing," answer EventBridge Pipes. If "schedule daily/hourly job," answer EventBridge Scheduler. If "react to S3 file arrival," answer S3 event notification. If "stream DynamoDB changes downstream," answer DynamoDB Streams + Lambda. If "Lambda is falling behind on Kinesis," answer increase parallelization factor.

Key Numbers And Must-Memorize Lambda + EventBridge Facts

Lambda Hard Limits

Maximum execution time: 15 minutes
Maximum memory: 10 GB (vCPU scales with memory)
Maximum synchronous payload: 6 MB
Maximum asynchronous payload: 256 KB
Ephemeral storage: up to 10 GB in /tmp
Default concurrency: 1000 per account per region
Deployment package: 250 MB unzipped, 50 MB zipped
Container image: up to 10 GB

Lambda Kinesis Trigger

Batch size: default 100, max 10000 records or 6 MB
Batch window: default 0, max 300 seconds
Parallelization factor: 1-10 per shard
Starting position: TRIM_HORIZON, LATEST, AT_TIMESTAMP

DynamoDB Streams

Retention: 24 hours
View types: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES
Use Kinesis Data Streams for DynamoDB for longer retention

EventBridge

Rules: pattern matching with fan-out to multiple targets
Pipes: point-to-point with filter and enrichment
Scheduler: one-time and recurring schedules
Buses: default, custom, partner

S3 Event Notifications

Targets: Lambda, SQS, SNS, EventBridge
Filter: prefix and suffix on object key only
Event types: ObjectCreated, ObjectRemoved, ObjectRestore, etc.

Decision Matrix

Short event processing → Lambda
Batch ETL → Glue
Multi-service orchestration → Step Functions
Cron-style scheduling → EventBridge Scheduler

DEA-C01 exam priority — Lambda and EventBridge for Event-Driven Ingestion. This topic carries weight on the DEA-C01 exam. Master the trade-offs, decision boundaries, and the cost/performance triggers each AWS service exposes — the exam will test scenarios that hinge on knowing which service is the wrong answer, not just which is right.

FAQ — Lambda And EventBridge Top Questions

Q1 — When should I use Lambda vs Glue for data processing?

Use Lambda when the workload is short (under 15 minutes), small payload (under 6 MB synchronous), event-driven, and stateless. Lambda excels at lightweight transformations, routing, validation, and orchestration glue between services. Use Glue for any batch ETL exceeding Lambda's limits — multi-GB to TB-scale processing, multi-hour runtimes, distributed Spark compute, schema-aware transforms, and integration with the Glue Data Catalog. The clearest rule: if the task is "react to one event, do a small transform, write to a destination," it is Lambda. If the task is "run a daily ETL on yesterday's data," it is Glue. The hybrid pattern is common: S3 event triggers Lambda, Lambda validates and invokes a Glue job, Glue handles the heavy ETL, Step Functions or destinations route success or failure. Mistaking Lambda for batch ETL is the single most common DEA-C01 Domain 1 trap.

Q2 — What is the difference between EventBridge Rules and EventBridge Pipes?

EventBridge Rules match events on the event bus against JSON patterns and route matching events to one or more targets — fan-out is the core capability. Rules are used when the source is the bus itself (AWS service events, custom application events via PutEvents) and you need flexible pattern matching plus multi-target routing. EventBridge Pipes is point-to-point: one source (Kinesis, DynamoDB Streams, SQS, MSK, self-managed Kafka, Amazon MQ) connects to one target with optional filter and enrichment in between. Pipes is used when the source is a stream or queue and you want managed point-to-point integration without writing a Lambda just to filter and forward. The decision is shape-based: fan-out from a bus → Rules; one stream/queue to one target with optional transform → Pipes. They complement each other in real architectures — Pipes can target an EventBridge bus, where Rules then fan out further.

Q3 — How do I tune the Lambda Kinesis trigger for high throughput?

Three parameters drive throughput. Batch size (default 100, max 10000) controls records per invocation — increase to reduce per-invocation overhead, decrease if Lambda OOMs or times out per batch. Batch window (default 0, max 300 seconds) controls how long the mapping waits to accumulate a batch before invoking — increase for low-volume shards to avoid wasteful tiny invocations, leave at 0 for high-volume shards. Parallelization factor (1-10) controls concurrent invocations per shard — when set to N, the mapping splits records by partition key into N parallel sub-streams, processing different keys in parallel while preserving order within each partition key. Use parallelization factor when shard processing is CPU-bound and a single Lambda cannot keep up with the shard's 2 MB/s read budget. Monitor IteratorAge (records pending) and IncomingRecords/GetRecords.Bytes to diagnose lag. Increasing function memory also increases vCPU which increases per-record processing throughput. The exam plants "Lambda lagging on Kinesis with high IteratorAge" with parallelization factor as the canonical fix.

Q4 — When should I use EventBridge Scheduler vs CloudWatch Events scheduled rules?

Use EventBridge Scheduler for any new scheduled invocation. It scales to millions of schedules per account, supports flexible time windows, integrates directly with 270+ AWS services without intermediate Lambdas, supports per-schedule retry and dead-letter queue configuration, and offers cleaner one-time scheduling at future timestamps. CloudWatch Events scheduled rules are the legacy mechanism with hard limits (300 rules per region per account), shallow target integration (mainly Lambda and Step Functions), and no per-rule customization of retry behavior. The DEA-C01 exam expects you to recognize Scheduler as the modern answer for "schedule a daily ETL job" or "schedule a Lambda every 15 minutes." CloudWatch Events scheduled rules remain supported for backward compatibility with existing infrastructure but are not the recommended default for new designs.

Q5 — How do I propagate every change in a DynamoDB table to a data lake?

The canonical pattern is DynamoDB Streams + Lambda + S3. Enable Streams on the table with view type NEW_AND_OLD_IMAGES so the consumer sees both pre-change and post-change values (allows downstream to compute deltas or process deletes correctly). Configure a Lambda event source mapping that reads from the stream and writes records to S3 in Parquet format, partitioned by date. For longer than 24-hour retention or higher throughput, use Kinesis Data Streams for DynamoDB which exports change events to a Kinesis stream with up to 365-day retention, then Kinesis Data Firehose buffers and writes to S3 in Parquet automatically. For zero-ETL, use the DynamoDB zero-ETL integration with OpenSearch for search use cases or wait for direct DynamoDB-to-S3 zero-ETL features. Glue Data Catalog automatically picks up the S3 partitions if a crawler is scheduled or partition projection is configured in Athena.

Q6 — How do I handle Lambda failures in an event-driven pipeline?

Three layers of failure handling. First, automatic retry — for stream sources (Kinesis, DynamoDB Streams), the mapping retries the batch until success or the configured max retries (default unlimited until records expire). For asynchronous invocations (S3 events, EventBridge), Lambda retries twice automatically. For SQS, the queue's redrive policy controls retries. Second, bisect on function error for Kinesis and DynamoDB Streams — on retry, split the failed batch in half and retry each half, isolating the poison-pill record. Third, on-failure destinations or dead-letter queues — when retries are exhausted, send the failed event to an SQS queue, SNS topic, EventBridge bus, or Lambda for later analysis or replay. The modern destination feature includes more metadata than legacy DLQs and is the recommended choice. Always configure a destination or DLQ for production event-driven pipelines — without one, exhausted retries silently drop events.

Q7 — What is the right pattern for filtering S3 events before they reach my Lambda?

S3 event notification's native filtering is limited to prefix and suffix on the object key — incoming/ prefix or .parquet suffix. For richer filtering (object size, content type, tags, metadata), route through EventBridge: configure S3 to publish to EventBridge instead of directly to Lambda, then write an EventBridge Rule with a pattern matching the desired conditions, with Lambda as the rule target. EventBridge Rules can match on any field in the event detail, including object size, e-tag, and any other metadata in the S3 event. The benefit is significant: Lambda only invokes for events that match the full filter, saving invocation cost and simplifying the function code. The DEA-C01 exam plants this as "we want Lambda to process only Parquet files larger than 100 MB" — EventBridge with a rule pattern is the canonical answer, not a Lambda that filters and returns early.

What Is Event-Driven Ingestion?

The Core Building Blocks

Push vs Pull Models

Plain-Language Explanation: Event-Driven Ingestion

Analogy 1 — The Smart Home With Sensors And Appliances

Analogy 2 — The Newsroom With Reporters, Editors, And Wires

Analogy 3 — The Postal Sorting Center With Conveyor Belts

AWS Lambda — The Event Processor

Lambda Execution Model

The Hard Limits That Matter For Data Engineering

Concurrency Models — Reserved And Provisioned

Lambda Destinations And DLQs

Lambda Event Sources For Data Engineering

S3 Event Notifications

Amazon Kinesis Data Streams

DynamoDB Streams

Amazon SQS

Amazon MSK And Self-Managed Kafka

Amazon SNS

Amazon EventBridge — The Event Bus

EventBridge Rules — Pattern Matching And Routing

EventBridge Buses — Default, Custom, Partner

EventBridge Pipes — Point-To-Point With Filter And Enrichment

Pipes vs Rules — When To Use Each

EventBridge Scheduler — One-Time And Recurring

DynamoDB Streams As CDC — The Canonical Data Pipeline

What CDC Means

DynamoDB Streams Configuration

Stream Retention

Common CDC Pipeline Patterns

S3 Event Notifications — File Landing Triggers

Notification Targets

Event Types

S3 Event Filtering Limitations

Common Patterns — Trigger Glue, Catalog Updates

Lambda vs Glue vs Step Functions — Service Selection

Lambda — Short, Event-Driven Processing

Glue — Batch ETL At Scale

Step Functions — Orchestration Of Multi-Service Workflows

The Decision Matrix

Common Exam Traps For Lambda And EventBridge

Trap 1 — Lambda For Batch ETL

Trap 2 — EventBridge Rules vs Pipes Confusion

Trap 3 — S3 Event Notification Filter Limitations

Trap 4 — Kinesis Trigger Batch Size Mistuning

Trap 5 — DynamoDB Stream View Type Picked Wrong

Trap 6 — DynamoDB Streams Retention Underestimated

Trap 7 — Forgetting EventBridge Scheduler For Schedules

Key Numbers And Must-Memorize Lambda + EventBridge Facts

Lambda Hard Limits

Lambda Kinesis Trigger

DynamoDB Streams

EventBridge

S3 Event Notifications

Decision Matrix

FAQ — Lambda And EventBridge Top Questions

Q1 — When should I use Lambda vs Glue for data processing?

Q2 — What is the difference between EventBridge Rules and EventBridge Pipes?

Q3 — How do I tune the Lambda Kinesis trigger for high throughput?

Q4 — When should I use EventBridge Scheduler vs CloudWatch Events scheduled rules?

Q5 — How do I propagate every change in a DynamoDB table to a data lake?

Q6 — How do I handle Lambda failures in an event-driven pipeline?

Q7 — What is the right pattern for filtering S3 events before they reach my Lambda?

Further Reading — Official AWS Documentation For Lambda And EventBridge

Official sources

More DEA-C01 topics