Scalability with ELB, Caching, and Serverless - DOP-C02 DevOps Engineer Study Notes

Scalability on DOP-C02 is the composition of load balancing, caching, and serverless capacity across the application stack. The exam tests when ALB beats NLB, when CloudFront beats ElastiCache, when Lambda provisioned concurrency is the right answer over reserved concurrency, when DynamoDB on-demand wins over provisioned, and how API Gateway throttling and Fargate capacity providers interact with the rest of the stack. Domain 3.2 ("scalable to meet business requirements") is the largest single Domain 3 sub-section.

This guide assumes you know what an ALB is and what a Lambda function is. The DOP-C02 focus: ALB target groups, listeners, rules, NLB and Gateway Load Balancer use cases, cross-zone load balancing semantics and pricing, CloudFront caching behaviors and origin failover, ElastiCache Redis vs Memcached, DAX for DynamoDB read acceleration, Lambda reserved vs provisioned concurrency, API Gateway burst and steady-state limits, Fargate vs EC2 capacity providers in ECS, DynamoDB on-demand vs provisioned with auto-scaling, and Global Accelerator for cross-region acceleration.

Why Scalability Matters on DOP-C02

DOP-C02 lists "Identifying and implementing appropriate auto scaling, load balancing, and caching solutions" as the first skill in Domain 3.2. Community pass reports cite scalability scenarios as one of the most-tested categories: "100x traffic spike during a flash sale; pick the architecture" - that is API Gateway + Lambda provisioned concurrency + DynamoDB on-demand + CloudFront. "Multi-tenant SaaS with 50,000 customers" - ALB host-based routing + ECS Fargate + Aurora Serverless. "Real-time game backend with sub-millisecond latency" - NLB + EC2 with placement groups + ElastiCache Redis. The exam tests precise picks of each layer.

The exam also separates scaling primitives (ASG, Application Auto Scaling, Lambda concurrency) from caching primitives (CloudFront, ElastiCache, DAX, API Gateway cache) and from load balancing primitives (ALB, NLB, GWLB). Knowing the right layer for the bottleneck is the first elimination step.

Application Load Balancer (ALB): Layer 7 load balancer; routes by host header, path, query string, headers, source IP; supports HTTPS, HTTP/2, gRPC, WebSocket.
Network Load Balancer (NLB): Layer 4 load balancer; ultra-low latency, static IPs per AZ, supports TCP, UDP, TLS pass-through.
Gateway Load Balancer (GWLB): Layer 3/4 transparent forwarder for inserting third-party network appliances.
Target group: a logical group of backends (EC2, IP addresses, Lambda, ALB-as-target) attached to a listener rule.
Cross-zone load balancing: spreads traffic evenly across all AZs; default ON for ALB (free), default OFF for NLB (extra inter-AZ data charges if enabled).
CloudFront: AWS CDN with edge caching, AWS Shield Standard, custom origins, Lambda@Edge, CloudFront Functions.
ElastiCache: managed Redis or Memcached for in-memory caching.
DAX: in-memory cache specifically for DynamoDB; transparent client; sub-millisecond reads.
Lambda reserved concurrency: caps the maximum concurrent executions for a function and reserves that capacity from the account pool.
Lambda provisioned concurrency: pre-warmed Lambda execution environments; eliminates cold start latency.
API Gateway throttling: per-account, per-API, per-stage, per-method, and per-key throttling with burst and rate limits.
DynamoDB on-demand: pay-per-request capacity mode; scales automatically; higher per-request cost than provisioned at high steady throughput.
Capacity provider: ECS construct that maps an ASG (EC2) or Fargate to scaling targets; supports FARGATE, FARGATE_SPOT, custom EC2 ASGs.
Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/what-is-load-balancing.html

Plain-Language Explanation: Scalability Layers

The mechanics align with familiar capacity-management patterns from non-software domains. Three angles cover load balancing, caching, and serverless capacity.

Analogy 1: The Convention Center Crowd Management

A convention center with 50 halls handles tens of thousands of attendees. ALB (Layer 7) is the information desk staff reading attendees' badges (HTTP headers) and directing them to the right hall by topic ("Room A for AI talks, Room B for security workshops"). NLB (Layer 4) is the traffic light system at entrances that just measures cars and routes them to lots, no inspection of cargo - faster, simpler, no content awareness. Gateway Load Balancer is the security checkpoint that funnels every car through metal detectors transparently before letting them proceed.

Cross-zone load balancing is the multi-entry-routing between halls - if Hall A's parking is full, route to Hall B's even though they are in different wings. ALB does this for free; NLB charges for the inter-wing data transfer.

CloudFront is the information desks at every parking lot entrance (edge locations) - frequently asked questions ("where is registration") are answered locally without trekking to the central desk.

ElastiCache (Redis/Memcached) is the central FAQ binder at the information desk - things asked by many attendees are kept in the binder so staff do not have to call the planning office every time.

Lambda reserved concurrency is the dedicated translator pool for a specific session room - capped at 100 simultaneous translators, so one popular room cannot starve all the others.

Lambda provisioned concurrency is the pre-warmed translator team ready in the room before attendees arrive - no warmup delay when the first attendee asks a question.

Analogy 2: The Restaurant Chain Capacity Network

A restaurant chain manages capacity at multiple levels. ALB is the maitre-d asking your party size, allergies, and seating preference (Layer-7 routing) before assigning a table. NLB is the valet stand that just counts cars, no questions, ultra-fast. GWLB is the kitchen inspector transparently checking every dish on the way out.

CloudFront is the branch locations with regional menu items pre-prepared - reduces trips back to the central kitchen for popular dishes.

ElastiCache Redis is the prep kitchen with cached ingredients - the main kitchen does not have to chop onions for every order; common preps are kept warm.

DAX is the specialty pre-baked items in the DynamoDB-equivalent walk-in - returns specific items in microseconds, transparent to the cook.

Lambda provisioned concurrency is the pre-staffed lunch line - cooks ready before the rush starts. Reserved concurrency is the headcount cap on a specific station so a busy station cannot pull cooks from other stations.

API Gateway throttling is the maitre-d's burst-and-steady cap - "we can take 5 walk-ins per minute steady-state, with bursts up to 50 if all tables are empty".

Analogy 3: The Power Grid Load Distribution

A regional power grid manages demand and capacity. ALB is the substation routing power based on industrial vs residential vs commercial classifications (Layer 7 awareness). NLB is the trunk transmission lines moving high-voltage power without inspection. GWLB is the transparent metering every customer's draw passes through.

CloudFront is the distributed solar farms at the edge - meets local demand without drawing from the central grid. ElastiCache is the battery storage at substations - smooths peaks and serves frequently demanded current locally.

Lambda reserved concurrency is the dedicated capacity reservation for hospital substations - guaranteed power even if the rest of the grid is overloaded. Lambda provisioned concurrency is the spinning reserve - generators kept warm so they can synchronize and start delivering in milliseconds vs hours of cold startup.

DynamoDB on-demand is the pay-as-you-draw industrial customer - no commitment, but per-kWh price is higher. DynamoDB provisioned with auto-scaling is the standard residential contract with auto-adjusting baseline.

For the ALB/NLB/GWLB distinction, the convention center maitre-d/valet/inspector captures it cleanly. For caching tiers (CloudFront vs ElastiCache vs DAX), the restaurant prep network is intuitive. For Lambda concurrency types, the power grid reserved capacity maps the trade-offs. Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/what-is-load-balancing.html

Elastic Load Balancing

Three load balancer types cover most use cases.

Application Load Balancer (ALB)

Layer 7. Routes requests by host header, path, HTTP method, query string, headers, source IP. Supports:

HTTPS termination with ACM certificates.
HTTP/2 and gRPC.
WebSocket.
Lambda as a target (synchronous invocation per request).
ALB-as-target for NLB-fronted ALBs.
Authentication via Cognito or OIDC.
Stickiness (cookie-based or by application-set cookie).

ALB is the default for any web/API workload. Cross-zone load balancing is on by default and free.

Network Load Balancer (NLB)

Layer 4 (TCP/UDP/TLS). Highest throughput and lowest latency. Supports:

Static IP per AZ (one EIP allocatable).
Source IP preservation (clients see the original IP).
TLS termination.
VPC endpoint service (PrivateLink) provider.

NLB shines for: ultra-low-latency workloads (gaming, trading), workloads needing static IPs in firewall allowlists, and PrivateLink endpoints. Cross-zone load balancing is OFF by default; enabling it costs inter-AZ data transfer.

Gateway Load Balancer (GWLB)

Layer 3/4 transparent. Inserts third-party network appliances (firewalls, IDS/IPS) into traffic flow. Uses GENEVE encapsulation. Common in security architectures where all VPC egress must traverse a firewall fleet.

Target Groups, Listeners, Rules

ALB structure: Listener (port + protocol) → Listener Rules (priority-ordered conditions) → Target Group (backends + health checks).

Common ALB patterns:

Host-based routing: app1.example.com → TG-1, app2.example.com → TG-2.
Path-based routing: /api/* → TG-API, /static/* → TG-Static.
Weighted target groups: split traffic 90/10 between blue/green for canary.

ALB always cross-zone-balances and you cannot disable it (well, except via target group configuration in some contexts). NLB defaults to off, meaning each AZ's NLB endpoint only routes to targets in that AZ. Enabling NLB cross-zone is a billable inter-AZ data charge. The asymmetry is a frequent exam trap. Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html

CloudFront and Caching Tiers

Three caching tiers commonly compose.

CloudFront (Edge)

Global CDN with 600+ edge locations. Caches by (origin, cache key) where cache key includes path, query string components, headers per the cache policy. Features:

Origin failover: primary origin + secondary origin pair; CloudFront switches on configured HTTP error codes.
Lambda@Edge: full Node.js or Python at viewer-request, viewer-response, origin-request, origin-response - heavier and pricier.
CloudFront Functions: lightweight JS for viewer request/response - cheaper, faster, narrower API.
Signed URLs / Signed Cookies: time-bound access for private content.
Field-Level Encryption: end-to-end encryption of specific request fields.

ElastiCache (Application Cache)

Managed Redis or Memcached:

Memcached: simple, multi-threaded, sharded, no persistence, no replication.
Redis: rich data structures (lists, sets, sorted sets, streams, geo), persistence (AOF, RDB), replication, cluster mode for sharding, pub/sub.

Use Redis cluster mode for high-throughput sharded caches; use Memcached when you need simple key-value and high throughput per node without persistence.

DAX (DynamoDB Accelerator)

In-memory cache transparent to DynamoDB clients. SDK calls hit DAX first; cache miss falls through to DynamoDB. Sub-millisecond reads. Item cache (point reads) and query cache (query/scan results).

Use when DynamoDB read throughput is the bottleneck and items are eventually-consistent-read tolerant - DAX serves stale data within the configured TTL.

DAX caches eventually consistent reads. Strongly consistent reads bypass the cache and hit DynamoDB directly. Writes also pass through DAX to DynamoDB. The exam often tests "DAX for write-heavy workload" - it does not help. For strongly-consistent reads, only DynamoDB Standard reads work. Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.html

Lambda Concurrency

Lambda has three concurrency knobs.

Account Concurrency Pool

Per-account, per-region default of 1000 concurrent executions (raisable via support). Shared across all functions unless reserved.

Reserved Concurrency

Setting reserved concurrency on a function:

Caps the function's maximum concurrent executions at that value.
Reserves that capacity from the account pool (other functions cannot use it).
Useful for "do not let this function consume more than 100 of my 1000 budget" or "guarantee at least 100 capacity for this function".

Setting reserved concurrency to 0 effectively disables the function - useful for emergency throttling.

Provisioned Concurrency

Pre-initialized execution environments. Eliminates cold-start latency for the configured number of concurrent invocations. Costs more (you pay for the environments whether invoked or not).

Provisioned concurrency is configured per alias (or version), so you can provision for prod while leaving $LATEST un-provisioned. Combined with Application Auto Scaling, provisioned concurrency can scale on a schedule or in response to a ProvisionedConcurrencyUtilization metric.

You apply provisioned concurrency to a Lambda alias or specific version. Invocations of $LATEST always cold-start because they cannot use provisioned environments (which are bound to immutable versions). For predictable performance, always invoke via an alias with provisioned concurrency configured. Reference: https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html

API Gateway Throttling

API Gateway enforces throttling at multiple levels:

Account-level: per-region, default 10,000 requests-per-second steady, 5,000 burst.
Stage-level: per stage on a deployment.
Method-level: per HTTP method on a stage.
Usage plan + API key: per-customer throttling for monetized APIs.

The token bucket algorithm: bursts up to the burst limit, refilled at the rate limit.

When throttled, API Gateway returns 429 Too Many Requests. Clients should retry with exponential backoff.

For known-spike workloads, raise the account-level steady-state limit via support, and use provisioned concurrency on the Lambda integration to avoid downstream cold starts.

Fargate and ECS Capacity Providers

ECS capacity providers map ECS services to compute backends:

FARGATE: serverless ECS - no instance management, premium per-vCPU.
FARGATE_SPOT: 70 percent discount, can be reclaimed with 2-minute notice.
EC2 capacity providers: backed by ASGs you create; ECS manages the ASG's desired capacity based on tasks.

Capacity provider strategies allow weighting:

50 percent on FARGATE, 50 percent on FARGATE_SPOT for cost-optimized stateless workloads.
1 weight on FARGATE for at-least-one stable instance, additional capacity from FARGATE_SPOT.

For EC2-backed ECS, ECS managed scaling adjusts the ASG to keep the CapacityProviderReservation metric near a target.

DynamoDB Capacity Modes

Two capacity modes:

Provisioned: specify RCU/WCU. With Auto Scaling, DynamoDB adjusts capacity up to a max within target utilization (default 70 percent).
On-demand: no specified capacity. Pay per request. Scales to thousands of requests/second instantly.

On-demand wins when:

Traffic is unpredictable (event-driven, traffic spikes).
New workload without baseline data.
Sporadic workload (no steady throughput).

Provisioned with auto-scaling wins when:

Steady throughput where you can baseline capacity.
Cost matters at high consistent throughput (provisioned is cheaper per request).

You can switch modes once every 24 hours.

Global Accelerator

Anycast IPs from AWS edge that route over the AWS backbone to the closest healthy endpoint. Sub-second failover, lower latency than internet routing.

Use Global Accelerator for: TCP/UDP non-HTTP workloads (gaming, IoT) needing global acceleration, applications needing static IPs, and ultra-low-latency global apps. CloudFront serves the same role for HTTP/HTTPS.

Common Pitfalls (常考陷阱)

Picking ALB for non-HTTP workloads: NLB or GWLB for non-HTTP; ALB cannot terminate raw TCP.
Forgetting NLB cross-zone is off by default: causes uneven distribution if AZs have asymmetric target counts.
Treating provisioned concurrency as a function-level setting: it is alias-level; invocations of $LATEST always cold start.
Using DAX for strongly consistent reads: DAX bypasses for strongly consistent reads.
Treating DynamoDB on-demand as universally cheaper: for steady high throughput, provisioned is much cheaper.
Forgetting API Gateway burst limit: clients hammering on cold-start must respect the burst quota.
Layering ElastiCache where CloudFront would suffice: edge caching is cheaper and faster for static or near-static content.

DOP-C02 exam priority — Scalability with ELB, Caching, and Serverless. This topic carries weight on the DOP-C02 exam. Master the trade-offs, decision boundaries, and the cost/performance triggers each AWS service exposes — the exam will test scenarios that hinge on knowing which service is the wrong answer, not just which is right.

FAQ

Q1: When should I prefer ALB over CloudFront?

ALB for in-region request routing; CloudFront for global edge caching of static or cacheable content. They compose: CloudFront in front of ALB is the standard production pattern for global web apps with dynamic backends.

Q2: How do I choose between ElastiCache Redis and Memcached?

Redis for: persistence, replication, complex data types, pub/sub. Memcached for: simple key-value at very high throughput per node, no persistence needed. Modern apps default to Redis.

Q3: When does Lambda provisioned concurrency stop being worth it?

When the workload is dense enough that cold starts are amortized across many requests anyway, or when latency-sensitive paths can be moved to Fargate or EC2. Provisioned concurrency at 100 percent utilization 24/7 typically costs more than running a small EC2 fleet.

Q4: Can I use weighted target groups on ALB for canary?

Yes. ALB listener rules support weighted forwarding to multiple target groups (e.g., 95 percent to TG-stable, 5 percent to TG-canary). This is a CodeDeploy-orchestrated pattern; the same weights are how blue/green via ALB works.

Q5: How does DynamoDB on-demand handle traffic spikes?

On-demand instantly scales to twice your previous peak in 30 minutes; spikes beyond 2x previous peak may throttle. For unprecedented spikes, the recommendation is to pre-warm the table by gradually ramping traffic before the spike, or briefly switch to provisioned with high RCU/WCU.

Q6: What is the relationship between ASG, Application Auto Scaling, and capacity providers?

ASG scales EC2 fleets directly. Application Auto Scaling targets non-EC2 resources (DynamoDB tables, Aurora replicas, ECS services, Lambda provisioned concurrency, AppStream, Comprehend). ECS capacity providers wrap an ASG to scale instances based on task placement needs.

Q7: When should I use Global Accelerator over Route 53 latency-based routing?

Global Accelerator: TCP/UDP, sub-second failover, AWS backbone routing, static anycast IPs. Route 53 latency-based routing: DNS-based, TTL-bound failover, no static IPs. For HTTP, CloudFront usually trumps both. For non-HTTP global low-latency, Global Accelerator wins.

Wrap-Up

Scalability on DOP-C02 is the choreography of load balancing (ALB for HTTP, NLB for TCP/UDP, GWLB for inline appliances), caching (CloudFront at the edge, ElastiCache for application data, DAX for DynamoDB), and serverless capacity (Lambda concurrency knobs, API Gateway throttling, Fargate capacity providers, DynamoDB on-demand vs provisioned). Pick the right layer for the bottleneck, memorise the cross-zone defaults (ALB on, NLB off), the concurrency types (reserved caps, provisioned pre-warms), and the capacity mode trade-offs (on-demand for unpredictable, provisioned for steady high throughput). With those, scalability scenarios resolve to recognition.

Scalability — ELB, ElastiCache, and Serverless Patterns

Why Scalability Matters on DOP-C02

Plain-Language Explanation: Scalability Layers

Analogy 1: The Convention Center Crowd Management

Analogy 2: The Restaurant Chain Capacity Network

Analogy 3: The Power Grid Load Distribution

Elastic Load Balancing

Application Load Balancer (ALB)

Network Load Balancer (NLB)

Gateway Load Balancer (GWLB)

Target Groups, Listeners, Rules

CloudFront and Caching Tiers

CloudFront (Edge)

ElastiCache (Application Cache)

DAX (DynamoDB Accelerator)

Lambda Concurrency

Account Concurrency Pool

Reserved Concurrency

Provisioned Concurrency

API Gateway Throttling

Fargate and ECS Capacity Providers

DynamoDB Capacity Modes

Global Accelerator

Common Pitfalls (常考陷阱)

FAQ

Q1: When should I prefer ALB over CloudFront?

Q2: How do I choose between ElastiCache Redis and Memcached?

Q3: When does Lambda provisioned concurrency stop being worth it?

Q4: Can I use weighted target groups on ALB for canary?

Q5: How does DynamoDB on-demand handle traffic spikes?

Q6: What is the relationship between ASG, Application Auto Scaling, and capacity providers?

Q7: When should I use Global Accelerator over Route 53 latency-based routing?

Wrap-Up

Official sources

More DOP-C02 topics