Elastic Load Balancing Design — ALB, NLB, GWLB

Q: Q1: When should I use ALB vs NLB for my application?

Use ALB when the routing decision is at HTTP layer — path-based routing, host-based routing, header inspection, sticky sessions via cookie, gRPC with HTTP/2, Lambda backends, mTLS with header-forwarded identity, Cognito or OIDC authentication actions. Use NLB when (a) the protocol is not HTTP (TCP, UDP, custom binary, MQTT, SSH, RDP, gaming), (b) the application requires the original client IP at TCP layer (source IP preservation), (c) you need static IPs for whitelisting, (d) ultra-high throughput at sub-millisecond latency is required, (e) instant scaling without warm-up is mandatory. ANS-C01 questions often combine constraints — "HTTP traffic but the backend reads connection.peer_address for source IP" — that combination forces NLB-with-PROXY-PROTOCOL or ALB-with-X-Forwarded-For; the answer depends on the application's flexibility.

Q: Q5: Can I have multiple TLS certificates on a single ALB or NLB listener?

Yes, via SNI (Server Name Indication) . The TLS ClientHello includes the requested hostname; the LB matches it against the certs configured on the listener and presents the matching cert. Up to 25 certs per listener by default (raisable). This lets a single LB serve app1.example.com , app2.example.com , app3.example.com with separate certs — useful when wildcard certs are not acceptable for compliance reasons. SNI is supported by all modern browsers and TLS clients; legacy clients (Windows XP, Java 6) do not support SNI and would receive the default cert (which may not match their request).

Q: Q6: When is connection draining (deregistration delay) too long or too short?

Too short (0 seconds): in-flight HTTP requests are abruptly disconnected when a target deregisters; users see 5xx errors. Typical for development environments. Too long (3600 seconds): deployments take an hour to complete; rolling updates stall. Auto-scale-in is delayed. Sweet spot : HTTP REST APIs: 60–120 seconds. WebSocket / streaming gRPC: 1800–3600 seconds (let connections naturally expire). Database connection pools: match the pool's max-idle (often 300 seconds). ANS-C01 will test the WebSocket / long-lived connection scenario — long deregistration delay is the right answer for "users on a video stream see disconnections during scale-in".

Q: Q8: What does proxy protocol v2 do for NLB TLS termination?

When NLB terminates TLS, the original client's IP is replaced by the NLB's IP at the TCP layer — backends see only the NLB. Proxy protocol v2 is a standardised binary header that NLB prepends to the TCP stream, containing the original 5-tuple (source IP, source port, destination IP, destination port, protocol). The backend application reads the proxy protocol header (a few bytes at the start of the connection), extracts the original client IP, and proceeds with the TLS-decrypted TCP stream. Without proxy protocol, the backend sees only the NLB IP. Application support is required — most modern proxies (HAProxy, NGINX, Envoy) support proxy protocol natively. ANS-C01 will reward enabling proxy protocol when NLB-TLS-termination is combined with "must see original client IP" requirements.

Q: Q9: How do I architect a multi-region ALB with global failover?

Create one ALB per region. Health check the ALB endpoints from Route 53 (using alias-evaluate-target-health). Use Route 53 failover routing policy with primary region's ALB as primary and secondary region's ALB as secondary. Or use latency-based routing with health-check evaluation across both regions for users to get the lowest-latency healthy region. For gradual cutover, use weighted routing with shifting weights. Alternatively, sit a Global Accelerator in front of regional ALBs as endpoint groups — Global Accelerator's anycast IPs give clients sub-100ms time-to-failover when a region degrades. The Specialty answer for highest-availability multi-region: Global Accelerator + regional ALBs as endpoint groups, with Route 53 alias to the Global Accelerator's static IPs at zone apex.

Q: Q10: When should I use IP-mode target groups instead of instance-mode?

Use IP mode when: Targets are EKS pods (with VPC CNI for pod IPs). Targets are on-premises servers reachable via VPN or Direct Connect (ALB can route to non-VPC IPs in connected VPCs/CIDRs). You need lower latency by skipping kube-proxy or NodePort. You want pod-level health checks rather than node-level. Use instance mode when: Targets are bare EC2 instances and you want simple registration. The application uses host-port (binding to a NodePort) and you want the LB to route to the EC2 instance's port. For modern EKS workloads, IP mode is the default best practice. ANS-C01 questions about EKS load balancing will favor IP mode.

Elastic Load Balancing on ANS-C01 is not the conversation it was on SAA-C03. The architect exam asks "ALB or NLB for HTTP traffic?". The Advanced Networking Specialty exam asks "you have a multi-region application with regional ALBs behind a Global Accelerator, an inline third-party firewall using GENEVE encapsulation in a separate inspection VPC, an EKS cluster using IP-mode target groups via the Load Balancer Controller, gRPC services that need HTTP/2 end-to-end, mTLS authentication that requires client cert pass-through, plus a regulatory requirement that source IPs are preserved into the application — design the ELB layer in 60 seconds". That is a Network Engineer problem that pulls in ALB listener rules, NLB static IPs and TLS passthrough, GWLB GENEVE port 6081, target group types, cross-zone load balancing defaults, SNI for multi-cert TLS, AWS Load Balancer Controller IngressClass and TargetGroupBinding, and zonal shift — and ANS-C01 routinely tests every one of those moving parts inside a single five-line scenario.

This topic is Domain 1 (Network Design, 30 percent of the exam) Task Statement 1.3 in its entirety. The official ANS-C01 exam guide lists the knowledge bullets verbatim: "How load balancing works at layer 3, layer 4, and layer 7", "Different types of load balancers", "Connectivity patterns (internal vs external)", "Scaling factors", "Integration with Global Accelerator, CloudFront, AWS WAF, Route 53, EKS, ACM", "Configuration options (proxy protocol, cross-zone, sticky sessions, routing algorithms)", "Target group types (TCP, GENEVE, IP vs instance)", "AWS Load Balancer Controller for Kubernetes clusters", and "Encryption and authentication considerations (TLS termination, TLS passthrough)". Roughly 7 to 10 of the 65 exam questions touch this territory.

Why ELB Design Is the Heart of ANS-C01 Domain 1

Every layer of an AWS architecture eventually terminates at a load balancer — public traffic at an ALB, internal east-west traffic at an internal ALB or NLB, third-party security inspection at a GWLB, EKS pod traffic at an ALB or NLB created by the Load Balancer Controller. Picking the wrong load balancer type forces costly rework: an NLB cannot do path-based routing, an ALB cannot preserve source IPs the way an NLB can, neither can transparently inspect packets the way a GWLB can. The Specialty exam tests this because Network Engineers are expected to make these choices correctly under requirements pressure.

The mental model the exam rewards is OSI-layer matching: pick the load balancer at the layer where the routing decision lives. ALB lives at layer 7 (HTTP/HTTPS/gRPC) and inspects request headers, paths, hosts, and methods. NLB lives at layer 4 (TCP/UDP/TLS) and routes by 5-tuple without parsing content. GWLB lives at layer 3 (IP) and is transparent — it forwards every packet through an appliance via GENEVE encapsulation. Pick the layer where the routing decision lives, and the LB type follows.

Plain-Language Explanation: ELB Design — ALB, NLB, and GWLB

ELB combines three load-balancer types operating at different OSI layers, each with distinct features for routing, target selection, encryption, and integration. Three analogies anchor the moving parts.

Analogy 1: The Hotel Reception Desk

Think of three hotel reception desks with different specialisations.

The Application Load Balancer (ALB) is the luxury hotel concierge. The concierge reads the guest's request in detail ("I'd like a quiet room facing the courtyard, vegetarian breakfast, late checkout"), inspects multiple attributes (host, path, query string, headers, method, cookies), and routes the guest to a specific service desk: the spa, the restaurant, the gym, or the business centre. The concierge can sticky-assign you to a specific room (sticky sessions), can serve gRPC (multilingual), and can speak HTTP/2 fluently. ALB targets are like specific service desks — the spa, the restaurant — each running a different program.

The Network Load Balancer (NLB) is the hotel valet desk. The valet doesn't ask what you want — they just look at your car (5-tuple: source IP, source port, destination IP, destination port, protocol) and direct you to a parking spot. The valet preserves the original car (source IP) without changing it, gives you a dedicated parking space per zone (static IP per AZ, optional EIP), and handles 1 million cars per minute without breaking a sweat. NLB doesn't read your itinerary; it just parks fast and reliably.

The Gateway Load Balancer (GWLB) is the hotel security checkpoint at the back loading dock. Every cargo (packet) coming in goes through the checkpoint, which wraps it in tamper-evident packaging (GENEVE encapsulation, port 6081), sends it through a security appliance (Palo Alto, Fortinet, Suricata IDS), and hands it back unchanged on the other side. The checkpoint is transparent — the cargo never knows it was inspected. GWLB is the way to insert a third-party security appliance into the data path without modifying the application.

Analogy 2: The Restaurant With Multiple Service Lines

Imagine a restaurant chain.

The ALB is the gourmet sit-down restaurant where waiters take your full order, route specific dishes to the right kitchen station (path-based routing: appetizers to the cold kitchen, mains to the grill, desserts to the patisserie), can split tables (host-based routing: weddings to the private dining room, casual diners to the main floor), and can swap menus mid-service (rule-based redirects, fixed responses). The waiters speak HTTP/2 and gRPC fluently and remember your dietary preferences (sticky sessions via cookie). The ALB cannot start a new restaurant in 5 seconds — pre-warming for surprise traffic spikes is a real concern.

The NLB is the fast-food drive-through. Cars arrive, the cashier reads only the license plate (5-tuple), assigns to a window. No menu inspection, no deep questions, just throughput — millions of orders per second. The drive-through has a fixed entry address (static IP per AZ, the EIP), preserves the customer's original car (source IP preservation), and scales instantly with no warmup needed.

The GWLB is the food safety inspector at the loading dock. Every truck delivering ingredients passes through the inspector, who unloads, inspects, re-packages (GENEVE wrap), and routes through a third-party food safety lab (Palo Alto VM-Series, Fortinet, Aviatrix), then re-loads and forwards. The kitchen never knows the inspection happened — perfectly transparent.

Analogy 3: The Postal Sorting System

The ALB is the specialty mail routing centre that opens every package to read the destination, sender, contents, special instructions, and dispatches accordingly (path, host, header, cookie, query parameter routing). It can also rewrite addresses on the fly (URL rewriting via redirects), generate canned responses (fixed-response actions), and require a signature for delivery (authenticate-cognito, authenticate-oidc). The ALB has many specialised conveyor belts (target groups), each going to a different building (instance, IP-mode, Lambda function).

The NLB is the bulk parcel sorter that reads only the postage label (5-tuple). Parcels go to the right depot in milliseconds, no content inspection. Source-of-origin labels are preserved end-to-end (source IP preservation). Each AZ has its own dedicated PO Box (static IP, optionally an Elastic IP).

The GWLB is the customs inspection at the border. Every parcel crossing the border is wrapped in customs tamper-evident packaging (GENEVE), passed through customs (third-party appliance), and forwarded unchanged on the other side. Transparent customs that the sender and receiver don't know about.

For ANS-C01, the hotel reception desk is the highest-yield mental model when the question asks "ALB vs NLB" — concierge for content-aware routing, valet for fast 5-tuple routing. The food safety inspector sub-analogy makes GWLB and inline appliance inspection intuitive. The postal sorting system is best when the question hinges on target groups and routing rules. Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/what-is-load-balancing.html

Load Balancer Tiers — Layer 3, 4, and 7

The OSI model framing is what the exam rewards. Each ELB type operates at a specific layer.

Layer 7 — ALB

The ALB inspects HTTP/HTTPS traffic up to the application layer. It sees the full HTTP request: method, URI path, host header, query string, headers, cookies, body (within limits), client certificate (mTLS). It can route based on any of these, terminate TLS, modify headers (X-Forwarded-For), and execute lambda-backed actions. ALB does not support arbitrary TCP, UDP, or non-HTTP protocols — only HTTP, HTTPS, and gRPC.

Layer 4 — NLB

The NLB sees only the layer 4 connection: source IP, source port, destination IP, destination port, protocol (TCP or UDP). It does not inspect content. It can optionally terminate TLS at the listener (TLS listener type), but with TCP/UDP listeners it is fully transparent. NLB supports any protocol over TCP, UDP, or TLS — including non-HTTP protocols (SSH, RDP, MQTT, custom binary protocols, gRPC over TCP, etc.).

Layer 3 (transparent) — GWLB

The GWLB operates at layer 3 — IP packets. It does not terminate any connection. Every packet is wrapped in GENEVE encapsulation (UDP port 6081) and forwarded to a third-party appliance via a GENEVE-capable target group. The appliance inspects (and optionally modifies) the packet, then sends it back to the GWLB, which forwards to the original destination. The application is unaware of GWLB's presence.

Application Load Balancer Deep Dive

Listeners and rules

An ALB listener binds to a port (typically 80 or 443) and protocol (HTTP, HTTPS). Each listener has rules evaluated in priority order. A rule has conditions (host header, path pattern, query string, source IP CIDR, HTTP request method, HTTP header) and actions (forward to target group, redirect, return fixed response, authenticate, redirect to OIDC).

The default rule fires when no other rule matches. Up to 100 rules per listener by default (raisable via Service Quotas).

Content-based routing

The most-tested ALB capability. Examples:

Host-based: api.example.com routes to API target group; static.example.com routes to static-content target group.
Path-based: /api/* routes to API target group; /static/* routes to S3-fronted target group.
Header-based: User-Agent: Mobile* routes to mobile-optimized target group.
Query-string-based: ?version=v2 routes to v2 target group for canary testing.

Target group types

Instance: target is an EC2 instance ID; ALB routes to the instance's primary IP.
IP: target is an arbitrary IP in the VPC's CIDR or a peered/connected CIDR; useful for on-prem servers via VPN/DX, or for EKS pods via Load Balancer Controller IP-mode.
Lambda: target is a Lambda function; ALB invokes the function for each request, with a serialised event payload.

Sticky sessions

ALB supports two sticky-session modes: load balancer-generated cookie (AWSALB) and application cookie (an existing cookie name like JSESSIONID). Stickiness duration is configurable (1 second to 7 days). Useful for stateful applications that store session state locally on each instance.

gRPC and HTTP/2 support

ALB supports HTTP/2 end-to-end and gRPC as a first-class protocol. Target groups can be configured with protocol_version: GRPC. Health checks can use gRPC health status. This is the canonical answer for modern microservices that need HTTP/2 multiplexing.

Authentication actions

ALB can authenticate incoming requests against Amazon Cognito user pools or any OIDC-compatible identity provider (Auth0, Okta, Azure AD). Configured as a listener rule action; ALB handles the OAuth flow, sets a session cookie, and forwards the user identity to the backend in a JWT header.

ALBs scale automatically but with non-zero ramp-up time. For known traffic spikes (Black Friday, product launches, scheduled events), file an AWS Support pre-warming request 24–48 hours in advance. Without pre-warming, the ALB may shed traffic with HTTP 503 during the initial scale-up. The exam version: "Black Friday traffic spike causes 503 errors at the start". Answer: pre-warm the ALB via AWS Support. Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html

Network Load Balancer Deep Dive

TCP, UDP, and TLS listeners

NLB supports three listener protocols:

TCP: pure layer 4, source IP preserved.
UDP: layer 4 UDP (e.g. DNS, gaming, IoT MQTT-SN); source IP preserved.
TLS: NLB terminates TLS, then forwards plaintext TCP to the target. Source IP NOT preserved (proxy protocol v2 can be enabled to preserve in a header).

Static IPs per AZ

Each AZ enabled on the NLB gets a static IP allocated automatically by AWS. You can optionally allocate an Elastic IP per AZ for branding or whitelisting purposes. NLB IPs do NOT change over the lifetime of the LB — perfect for whitelisting on customer firewalls. ALB IPs do change (they scale up/down with traffic), so DNS-based reachability is required for ALB.

Source IP preservation

NLB with TCP listener preserves the client's source IP all the way to the target. This is the canonical answer for "the application requires the real client IP". With TLS listener, the source IP is replaced by the NLB's IP at the TLS termination point — to recover client IP, enable proxy protocol v2 on the target group, which prepends the original 5-tuple to the connection.

Source IP preservation in Application Load Balancer comparison

ALB does NOT preserve source IP at the TCP layer — it terminates the HTTP connection and creates a new connection to the backend. Instead, ALB sets the X-Forwarded-For header with the original client IP. Applications must read the header (and trust the LB).

NLB scaling and warmup

NLB scales instantly without warmup — it can absorb 1M+ requests per second from a cold start. This is in contrast to ALB which needs pre-warming for known spikes. NLB's flow-state-tracking is distributed across thousands of underlying nodes.

Zonal shift

Zonal shift is an AWS-managed feature that lets you instantly remove an NLB's targets in a specific AZ from the load-balancing pool — useful for AZ-scoped impairments. Trigger via API or console; targets in the affected AZ are excluded for a configurable duration (1 hour to 3 days). New since 2024 and exam-relevant.

Historically, NLB target ENIs received traffic from the NLB's IP, and security groups on targets had to allow source from a specific IP range (typically 0.0.0.0/0 if source IP preservation was disabled). In late 2023, AWS added security group support for NLB as an opt-in feature — you can now attach security groups to the NLB itself, simplifying the security model. ANS-C01 questions written before this feature treat NLB as having no SG; questions after treat NLB as having SG. The current best practice: enable SG support on new NLBs and use SG-as-source rules from the NLB SG to target SGs. Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html

Gateway Load Balancer Deep Dive

What problem GWLB solves

Inserting a third-party security appliance (Palo Alto VM-Series, Fortinet FortiGate, Check Point CloudGuard, Aviatrix gateway, Suricata IDS) inline between two networks. Pre-GWLB, this required complex routing tricks with VPN tunnels and source/destination check disabling on the appliance. GWLB makes this trivial.

GENEVE encapsulation

GWLB wraps every IP packet in GENEVE (Generic Network Virtualisation Encapsulation) on UDP port 6081. The appliance receives the GENEVE-encapsulated packet, inspects the inner IP packet, and returns it (modified or not) back to the GWLB, which strips the GENEVE wrapper and forwards to the original destination. The original packet's source/destination IPs are preserved.

GWLB endpoint (GWLBe)

A consumer VPC consumes the GWLB's service via a GWLB endpoint (GWLBe) — a special VPC endpoint that routes traffic into the GWLB's GENEVE pipeline. Route tables in the consumer VPC point to the GWLBe; from the consumer's perspective, GWLBe is a transparent next-hop.

Bump-in-the-wire topology

The canonical pattern: a route table sends 0.0.0.0/0 (or a specific destination) to the GWLBe; GWLBe forwards to GWLB; GWLB encapsulates and sends to the appliance fleet (target group); appliance returns; GWLB forwards to the original destination. The application is unaware.

Cross-zone load balancing on GWLB

GWLB always distributes traffic across all zones — there is no per-zone affinity option. This is unlike ALB (cross-zone enabled by default but configurable) and NLB (cross-zone disabled by default).

Use cases

Centralized inspection VPC — all spoke VPCs route through TGW to an inspection VPC; GWLB inserts the third-party appliance.
Egress filtering — outbound internet traffic from spokes is inspected via GWLB before reaching NAT GW and IGW.
Compliance — regulated industries that mandate specific commercial firewall vendors use GWLB to integrate.

ALB: layer 7, HTTP/HTTPS/gRPC, content-based routing, dynamic IPs.
NLB: layer 4, TCP/UDP/TLS, source IP preservation, static IPs per AZ.
GWLB: layer 3 transparent, GENEVE encapsulation, third-party appliance inline.
GENEVE: encapsulation protocol, UDP port 6081, used by GWLB.
Cross-zone load balancing: distribute traffic across all enabled AZs (vs per-AZ affinity).
Target group: collection of targets (instance, IP, Lambda, GENEVE) that the LB routes to.
Source IP preservation: original client IP visible to backend; default for NLB TCP, requires proxy protocol for NLB TLS, never for ALB (uses X-Forwarded-For).
Pre-warming: AWS Support manual scaling action for ALB before known traffic spikes.
Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html

Cross-Zone Load Balancing — Defaults and Cost

Cross-zone load balancing distributes traffic across targets in all enabled AZs. Without it, the LB only sends to targets in the same AZ as the request entry point. The defaults differ by LB type and matter for both reliability and cost.

Defaults

ALB: cross-zone enabled by default, no charge for cross-zone traffic.
NLB: cross-zone disabled by default; enabling charges per-GB cross-AZ data transfer.
GWLB: cross-zone always enabled, no toggle.

Why NLB defaults to disabled

NLB is optimised for ultra-low latency and ultra-high throughput. Cross-AZ traffic adds latency (sub-millisecond, but measurable) and cost. The default-off lets the operator opt in when needed — typically when target distribution is uneven across AZs.

Cost implication

For NLB with cross-zone enabled, every cross-AZ packet is billed twice: once leaving the source AZ, once entering the destination AZ. For high-throughput workloads (e.g. millions of TCP connections per second), this can dwarf the LB's hourly cost.

When to enable on NLB

Targets are unevenly distributed across AZs (e.g. 5 instances in AZ-a, 1 in AZ-b).
AZ-level fault tolerance requires that AZ-a targets serve AZ-b traffic if AZ-b targets fail.
Per-AZ traffic spikes are unpredictable and you want to smooth load.

ANS-C01 frequently asks about cost optimization. A scenario: NLB with cross-zone enabled, 100 GB per day cross-AZ, $0.01/GB cross-AZ data charge — that is $1/day = $30/month per LB just for cross-zone. Multiply by 50 NLBs in a production fleet = $1500/month. The exam answer for cost reduction in this pattern: disable cross-zone on NLBs that have balanced target distribution per AZ. ALB cross-zone is free. GWLB cross-zone is mandatory. Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html

TLS Termination vs TLS Passthrough

TLS termination at the LB

The LB terminates the TLS connection — decrypts traffic, applies routing/inspection logic in cleartext, then either re-encrypts to the backend or forwards in plaintext. Pros: LB can inspect HTTP headers (path, host, etc.) for routing; centralised cert management at LB; reduces compute load on backend. Cons: traffic between LB and backend is unencrypted unless explicitly re-encrypted; LB sees the cleartext traffic.

TLS passthrough on NLB

NLB with TCP listener (not TLS listener) passes encrypted traffic through to the backend without termination. The backend handles TLS termination. Pros: end-to-end encryption with no LB-side decryption; mTLS works because client cert reaches the backend; certificates managed at the backend. Cons: LB cannot inspect content for routing (no path-based routing); each backend must have its own cert (or share via wildcard).

TLS termination at NLB

NLB with TLS listener terminates TLS at the LB, then forwards plaintext TCP to the backend. Use when backends are not TLS-capable but you want client-side TLS.

Mutual TLS (mTLS)

For mTLS authentication where the client presents a certificate, NLB with TCP listener (passthrough) is the typical answer because the backend needs the client cert. ALB recently added mTLS support — the ALB can verify client cert and forward identity in a header.

SNI (Server Name Indication) — multiple certs per listener

Both ALB and NLB support SNI, where multiple TLS certificates can be associated with a single listener. The LB selects the cert based on the SNI hostname in the TLS ClientHello. This lets a single LB serve dozens of distinct domains without dedicated listeners or LBs per domain.

AWS Load Balancer Controller for Kubernetes

The AWS Load Balancer Controller is a Kubernetes controller that creates and manages ALBs and NLBs from Kubernetes resources. It is the EKS-native answer for exposing services externally.

IngressClass and Ingress

A Kubernetes Ingress resource specifies host/path routing for HTTP traffic. The Load Balancer Controller watches Ingress resources, creates an ALB, configures listener rules to match the Ingress, and registers backend pods as targets. The IngressClass field selects which controller handles the Ingress (e.g. alb for ALB-managed).

Service of type LoadBalancer

For non-HTTP traffic, a Kubernetes Service of type LoadBalancer with annotations triggers an NLB. The controller registers pods as IP-mode targets and routes traffic to them.

IP mode vs instance mode

Instance mode: ALB target is the EC2 instance running the kubelet; traffic enters via the kube-proxy NodePort and is forwarded to the pod via iptables.
IP mode: ALB target is the pod's IP directly (using VPC CNI); traffic skips kube-proxy and goes pod-to-pod over the VPC network.

IP mode is the modern recommendation: lower latency, simpler debugging, no kube-proxy hop. Requires VPC CNI with sufficient pod IP capacity (prefix delegation).

TargetGroupBinding

For more advanced scenarios, the TargetGroupBinding custom resource binds an existing target group to a Kubernetes Service. Useful for sharing a target group across multiple namespaces or pre-existing target groups managed outside Kubernetes.

ALB: layer 7, HTTP/HTTPS/gRPC, dynamic IPs, supports cross-zone (free), pre-warm for spikes.
NLB: layer 4, TCP/UDP/TLS, static IP per AZ, optional EIP, source IP preserved on TCP, instant scaling.
GWLB: layer 3, GENEVE port 6081, transparent third-party appliance integration.
Cross-zone defaults: ALB enabled, NLB disabled, GWLB always enabled.
ALB rules per listener: 100 default (raisable).
Listener limit per LB: 50 listeners.
Target group types: instance, IP, Lambda (ALB only), GENEVE (GWLB only).
SNI: both ALB and NLB support multiple certs per listener.
Connection draining: configurable deregistration delay (0–3600 seconds).
Reference: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/what-is-load-balancing.html

Connection Draining and Deregistration Delay

When a target is deregistered (auto-scale-down, deployment, manual removal), the LB stops sending new connections to it but allows in-flight connections to complete. The deregistration delay (default 300 seconds, range 0–3600) controls how long the LB waits before forcibly closing in-flight connections.

Tuning recommendations

Long-running connections (WebSocket, video streaming, gRPC streaming): set 1800–3600 seconds.
Short HTTP: 60–120 seconds is fine.
Aggressive deployments: 0 seconds (immediate cutoff) — useful when deploying often and brief disruption is acceptable.

Integration Patterns

Global Accelerator + NLB

Global Accelerator provides static anycast IPs at AWS edge locations, routes to the closest healthy region's NLB endpoint, and gives sub-100ms global latency for non-HTTP traffic (TCP, UDP). Canonical for gaming, VoIP, financial trading.

CloudFront + ALB origin

CloudFront caches HTTP responses at edge, routes cache misses to an ALB origin. Use the ALB's DNS name (or CloudFront origin shield) as origin. Combine with Origin Access Control if using S3 origin.

Route 53 alias + ALB/NLB

Route 53 alias record at zone apex points to ALB or NLB DNS. With Evaluate Target Health, Route 53 returns the LB only if it has healthy targets. Failover routing across two regional LBs gives multi-region active-passive.

ALB + Cognito for authentication

ALB listener rule with authenticate-cognito action redirects unauthenticated users to a Cognito hosted UI, completes the OAuth flow, sets a session cookie, and forwards the authenticated user identity to backends in a JWT header.

Common Traps Recap — ELB on ANS-C01

Trap 1: NLB has dynamic IPs

Wrong. NLB has static IPs per AZ for its lifetime; ALB has dynamic IPs.

Trap 2: ALB cross-zone is disabled by default

Wrong. ALB cross-zone is enabled by default and free. NLB cross-zone is disabled by default and chargeable.

Trap 3: GWLB uses VXLAN

Wrong. GWLB uses GENEVE on UDP port 6081.

Trap 4: ALB preserves source IP at TCP layer

Wrong. ALB terminates HTTP and creates a new connection. Source IP is forwarded in the X-Forwarded-For header.

Trap 5: NLB needs pre-warming for traffic spikes

Wrong. NLB scales instantly; ALB needs pre-warming for known spikes.

Trap 6: GWLB cross-zone load balancing is configurable

Wrong. GWLB always distributes across all zones; not configurable.

Trap 7: ALB supports any TCP protocol

Wrong. ALB supports only HTTP, HTTPS, and gRPC. Use NLB for arbitrary TCP/UDP.

Trap 8: Lambda targets work on NLB

Wrong. Only ALB supports Lambda target groups.

Trap 9: NLB target groups always need source-IP allow on backend SG

Was true historically; now NLB supports security groups as an opt-in feature.

Trap 10: Static IP per AZ means one EIP per AZ

Partial. NLB allocates a static IP automatically; you can additionally allocate an EIP per AZ if you want.

Trap 11: AWS Load Balancer Controller is for Fargate only

Wrong. It works for any EKS cluster, EC2 worker nodes or Fargate.

Trap 12: Sticky sessions persist across LB restarts

Partial. ALB load-balancer-generated cookie sticky sessions persist as long as the cookie is valid (configurable duration, default 1 day). Application cookie sticky persists as long as the application cookie persists.

Decision Matrix — Picking the Right Load Balancer

Requirement	Choice	Notes
HTTP / HTTPS path-based routing	ALB	Layer 7, content-aware.
gRPC end-to-end	ALB with HTTP/2	gRPC is HTTP/2 in disguise.
WebSocket	ALB or NLB	ALB has native support; NLB passes through TCP.
TCP/UDP non-HTTP	NLB	Layer 4, supports any TCP/UDP.
Static client IP whitelist	NLB with EIP per AZ	Guaranteed stable IPs.
Source IP preservation	NLB TCP listener	Preserved end-to-end.
TLS passthrough end-to-end	NLB TCP listener	No termination at LB.
TLS termination + path routing	ALB with TLS listener	Cert at LB, plaintext to backend.
Inline third-party security appliance	GWLB	GENEVE port 6081.
Lambda backend	ALB with Lambda target group	NLB does not support Lambda.
Sub-100ms global TCP/UDP	Global Accelerator + NLB	Anycast at edge.
EKS pod traffic	AWS Load Balancer Controller (ALB or NLB)	IP mode preferred.
mTLS where client cert reaches backend	NLB TCP passthrough	ALB has mTLS support but adds complexity.
Cost-conscious cross-AZ	NLB cross-zone disabled	Avoid per-GB cross-AZ charge.

FAQ — ELB Design on ANS-C01

Q1: When should I use ALB vs NLB for my application?

Use ALB when the routing decision is at HTTP layer — path-based routing, host-based routing, header inspection, sticky sessions via cookie, gRPC with HTTP/2, Lambda backends, mTLS with header-forwarded identity, Cognito or OIDC authentication actions. Use NLB when (a) the protocol is not HTTP (TCP, UDP, custom binary, MQTT, SSH, RDP, gaming), (b) the application requires the original client IP at TCP layer (source IP preservation), (c) you need static IPs for whitelisting, (d) ultra-high throughput at sub-millisecond latency is required, (e) instant scaling without warm-up is mandatory. ANS-C01 questions often combine constraints — "HTTP traffic but the backend reads connection.peer_address for source IP" — that combination forces NLB-with-PROXY-PROTOCOL or ALB-with-X-Forwarded-For; the answer depends on the application's flexibility.

Q2: How does GWLB enable third-party security appliance integration?

GWLB sits between two networks (e.g. spoke VPC and central inspection VPC). Traffic from the spoke is routed via a GWLB endpoint (GWLBe) into the GWLB. The GWLB encapsulates each packet in GENEVE (UDP port 6081) with metadata, sends to a target group of third-party appliance instances (Palo Alto, Fortinet, Aviatrix, Check Point). The appliance receives the GENEVE-wrapped packet, inspects the inner payload (decrypt with TLS inspection if enabled, apply IDS rules, etc.), and sends the packet back to the GWLB — modified, dropped, or unchanged. GWLB strips the GENEVE wrapper and forwards to the original destination. The application doesn't know the appliance was in the path. The canonical exam scenario: "we need to insert Palo Alto VM-Series for outbound internet inspection without changing application behavior" — answer is GWLB with bump-in-the-wire route table.

Q3: Why does NLB cross-zone load balancing default to disabled?

NLB optimizes for ultra-low latency. Cross-AZ traffic adds 1–2 milliseconds latency and per-GB cost. Default-off means the operator must opt-in, signaling awareness of the cost-vs-availability tradeoff. Enable cross-zone when (a) target distribution is uneven (e.g. 10 instances in AZ-a, 2 in AZ-b — without cross-zone, AZ-b clients only get 2 instances of capacity), (b) AZ-level resilience requires AZ-a clients to fail over to AZ-b targets, (c) traffic spikes are uneven by AZ. Disable cross-zone when (a) targets are balanced per AZ, (b) cost-sensitivity is high, (c) latency is critical. ANS-C01 will reward "disable cross-zone for cost optimization with balanced target distribution" as the canonical cost-conscious answer.

Q4: How does AWS Load Balancer Controller integrate with Kubernetes Ingress?

You install the controller via Helm or static manifests in the EKS cluster. You create a Kubernetes Ingress resource with annotations specifying ALB configuration (e.g. internet-facing or internal, target type IP or instance, listener port, SSL cert ARN). The controller watches Ingress resources, creates an ALB matching the spec, configures listener rules from the Ingress's host/path rules, and registers backend pod IPs (IP mode) or node NodePorts (instance mode) as targets. Updates to the Ingress (add a new path) automatically update the ALB rules. For Service-of-type-LoadBalancer (non-HTTP), similar workflow with annotations selecting NLB. IP mode is preferred because it skips the kube-proxy hop and gives direct pod-to-LB visibility.

Q5: Can I have multiple TLS certificates on a single ALB or NLB listener?

Yes, via SNI (Server Name Indication). The TLS ClientHello includes the requested hostname; the LB matches it against the certs configured on the listener and presents the matching cert. Up to 25 certs per listener by default (raisable). This lets a single LB serve app1.example.com, app2.example.com, app3.example.com with separate certs — useful when wildcard certs are not acceptable for compliance reasons. SNI is supported by all modern browsers and TLS clients; legacy clients (Windows XP, Java 6) do not support SNI and would receive the default cert (which may not match their request).

Q6: When is connection draining (deregistration delay) too long or too short?

Too short (0 seconds): in-flight HTTP requests are abruptly disconnected when a target deregisters; users see 5xx errors. Typical for development environments.

Too long (3600 seconds): deployments take an hour to complete; rolling updates stall. Auto-scale-in is delayed.

Sweet spot:

HTTP REST APIs: 60–120 seconds.
WebSocket / streaming gRPC: 1800–3600 seconds (let connections naturally expire).
Database connection pools: match the pool's max-idle (often 300 seconds).

ANS-C01 will test the WebSocket / long-lived connection scenario — long deregistration delay is the right answer for "users on a video stream see disconnections during scale-in".

Q7: How does Global Accelerator improve over CloudFront for non-HTTP traffic?

CloudFront is HTTP-only (HTTP/HTTPS). For TCP, UDP, gRPC-over-TCP, gaming, VoIP, MQTT, MQTT-SN, custom binary protocols — CloudFront does not work. Global Accelerator provides static anycast IPs at all AWS edge locations, routes traffic to the closest healthy regional NLB endpoint over the AWS backbone, and supports any TCP/UDP protocol. Result: clients connect to the nearest edge in <50ms, and the backend traffic uses the AWS private backbone instead of public internet. ANS-C01 questions about "global low-latency for game/VoIP/non-HTTP" want Global Accelerator + NLB; CloudFront is the wrong answer.

Q8: What does proxy protocol v2 do for NLB TLS termination?

When NLB terminates TLS, the original client's IP is replaced by the NLB's IP at the TCP layer — backends see only the NLB. Proxy protocol v2 is a standardised binary header that NLB prepends to the TCP stream, containing the original 5-tuple (source IP, source port, destination IP, destination port, protocol). The backend application reads the proxy protocol header (a few bytes at the start of the connection), extracts the original client IP, and proceeds with the TLS-decrypted TCP stream. Without proxy protocol, the backend sees only the NLB IP. Application support is required — most modern proxies (HAProxy, NGINX, Envoy) support proxy protocol natively. ANS-C01 will reward enabling proxy protocol when NLB-TLS-termination is combined with "must see original client IP" requirements.

Q9: How do I architect a multi-region ALB with global failover?

Create one ALB per region. Health check the ALB endpoints from Route 53 (using alias-evaluate-target-health). Use Route 53 failover routing policy with primary region's ALB as primary and secondary region's ALB as secondary. Or use latency-based routing with health-check evaluation across both regions for users to get the lowest-latency healthy region. For gradual cutover, use weighted routing with shifting weights. Alternatively, sit a Global Accelerator in front of regional ALBs as endpoint groups — Global Accelerator's anycast IPs give clients sub-100ms time-to-failover when a region degrades. The Specialty answer for highest-availability multi-region: Global Accelerator + regional ALBs as endpoint groups, with Route 53 alias to the Global Accelerator's static IPs at zone apex.

Q10: When should I use IP-mode target groups instead of instance-mode?

Use IP mode when:

Targets are EKS pods (with VPC CNI for pod IPs).
Targets are on-premises servers reachable via VPN or Direct Connect (ALB can route to non-VPC IPs in connected VPCs/CIDRs).
You need lower latency by skipping kube-proxy or NodePort.
You want pod-level health checks rather than node-level.

Use instance mode when:

Targets are bare EC2 instances and you want simple registration.
The application uses host-port (binding to a NodePort) and you want the LB to route to the EC2 instance's port.

For modern EKS workloads, IP mode is the default best practice. ANS-C01 questions about EKS load balancing will favor IP mode.

Once ELB design is in place, the natural next operational layers on ANS-C01 are: Edge Architecture — CloudFront and Global Accelerator which sit in front of ALBs and NLBs for global reach; Route 53 DNS — Public, Private, and Hybrid Architectures for the alias records pointing at ALBs/NLBs; Security Groups and NACLs — Stateful vs Stateless for the LB ingress / target SG layered controls; and Network Encryption — TLS, ACM, IPsec, and MACsec for the cert management at LB listeners.

Why ELB Design Is the Heart of ANS-C01 Domain 1

Plain-Language Explanation: ELB Design — ALB, NLB, and GWLB

Analogy 1: The Hotel Reception Desk

Analogy 2: The Restaurant With Multiple Service Lines

Analogy 3: The Postal Sorting System

Load Balancer Tiers — Layer 3, 4, and 7

Layer 7 — ALB

Layer 4 — NLB

Layer 3 (transparent) — GWLB

Application Load Balancer Deep Dive

Listeners and rules

Content-based routing

Target group types

Sticky sessions

gRPC and HTTP/2 support

Authentication actions

Network Load Balancer Deep Dive

TCP, UDP, and TLS listeners

Static IPs per AZ

Source IP preservation

Source IP preservation in Application Load Balancer comparison

NLB scaling and warmup

Zonal shift

Gateway Load Balancer Deep Dive

What problem GWLB solves

GENEVE encapsulation

GWLB endpoint (GWLBe)

Bump-in-the-wire topology

Cross-zone load balancing on GWLB

Use cases

Cross-Zone Load Balancing — Defaults and Cost

Defaults

Why NLB defaults to disabled

Cost implication

When to enable on NLB

TLS Termination vs TLS Passthrough

TLS termination at the LB

TLS passthrough on NLB

TLS termination at NLB

Mutual TLS (mTLS)

SNI (Server Name Indication) — multiple certs per listener

AWS Load Balancer Controller for Kubernetes

IngressClass and Ingress

Service of type LoadBalancer

IP mode vs instance mode

TargetGroupBinding

Connection Draining and Deregistration Delay

Tuning recommendations

Integration Patterns

Global Accelerator + NLB

CloudFront + ALB origin

Route 53 alias + ALB/NLB

ALB + Cognito for authentication

Common Traps Recap — ELB on ANS-C01

Trap 1: NLB has dynamic IPs

Trap 2: ALB cross-zone is disabled by default

Trap 3: GWLB uses VXLAN

Trap 4: ALB preserves source IP at TCP layer

Trap 5: NLB needs pre-warming for traffic spikes

Trap 6: GWLB cross-zone load balancing is configurable

Trap 7: ALB supports any TCP protocol

Trap 8: Lambda targets work on NLB

Trap 9: NLB target groups always need source-IP allow on backend SG

Trap 10: Static IP per AZ means one EIP per AZ

Trap 11: AWS Load Balancer Controller is for Fargate only

Trap 12: Sticky sessions persist across LB restarts

Decision Matrix — Picking the Right Load Balancer

FAQ — ELB Design on ANS-C01

Q1: When should I use ALB vs NLB for my application?

Q2: How does GWLB enable third-party security appliance integration?

Q3: Why does NLB cross-zone load balancing default to disabled?

Q4: How does AWS Load Balancer Controller integrate with Kubernetes Ingress?

Q5: Can I have multiple TLS certificates on a single ALB or NLB listener?

Q6: When is connection draining (deregistration delay) too long or too short?

Q7: How does Global Accelerator improve over CloudFront for non-HTTP traffic?

Q8: What does proxy protocol v2 do for NLB TLS termination?

Q9: How do I architect a multi-region ALB with global failover?

Q10: When should I use IP-mode target groups instead of instance-mode?

Further Reading and Related Operational Patterns

Official sources