Introduction to Connectivity Tests and the Network Intelligence Center
Connectivity Tests is the diagnostic engine inside Google Cloud's Network Intelligence Center (NIC) that answers a single deceptively simple question: "Can a packet from source A reach destination B, and if not, exactly which configuration element drops it?" It combines a static analyser that walks the same routing, firewall, NAT, and load-balancer metadata the Andromeda data plane uses, with an optional live data plane probe that injects a small number of test packets and observes their fate. For the Professional Cloud Network Engineer (PCNE) exam, you are expected to know when to choose Connectivity Tests over tcpdump, how to interpret a packet trace, and how the four NIC pillars (Connectivity Tests, Performance Dashboard, Firewall Insights, Network Topology) interlock with VPC Flow Logs and Network Analyzer.
This study note walks through endpoint modelling, every packet-trace drop reason that shows up on the exam, the Reachability Analyzer view of NIC, Network Analyzer findings, VPC Flow Logs analysis in BigQuery, the cross-region latency view in Performance Dashboard, and the troubleshooting playbook patterns that veterans run before they ever SSH to a VM.
白話文解釋(Plain English Explanation)
Before getting into Connectivity Tests on GCP, here are three concrete pictures that make the trade-offs click.
Think of Connectivity Tests Like a Pre-Flight Inspector at an Airport
A pilot does not just start the engines and hope the runway is clear. A pre-flight inspector walks around the aircraft, checks the manifest, checks the weather routing, checks the fuel records, and pre-clears every gate the flight will pass through before take-off. Connectivity Tests plays that exact role for a TCP/UDP/ICMP packet on GCP. Before you ship a workload, you describe the flight plan ("VM in us-central1 to private Cloud SQL in us-east1 on port 5432"), and the inspector reads the airport blueprints (VPC routes, firewall rules, peering policies, Private Service Connect endpoints, NAT mappings) to certify the route. If a gate is closed, the inspector tells you which gate, by name, before the plane ever leaves the hangar.
Think of Packet Trace Drop Reasons Like Customs Stamps on a Passport
When a parcel crosses borders, customs stamps a reason on every rejection: "MISSING_DECLARATION", "PROHIBITED_GOODS", "INSPECTOR_ABSENT". Each stamp tells you exactly which agency rejected the parcel and what to fix. Connectivity Tests emits the same kind of stamps. DROPPED_DUE_TO_NO_ROUTE means the routing table had no entry for the destination subnet; FIREWALL_BLOCK means the implicit-deny or an explicit deny rule fired; INSTANCE_NOT_RUNNING means the VM is stopped; LOAD_BALANCER_HAS_NO_BACKEND means the backend service is empty. Once you read the stamp, you know which team to call.
Think of Network Intelligence Center Pillars Like the Departments of a Hospital
A hospital does not have one giant "medical room"; it has cardiology, radiology, pharmacy, and the emergency department, each specialised. NIC is structured the same way. Connectivity Tests is the emergency department: you arrive bleeding, it tells you which artery is cut. Network Topology is radiology: it draws a real-time diagram of your traffic so you can see the anatomy. Firewall Insights is pharmacy: it audits which prescriptions (rules) are unused, overly broad, or about to be needed. Performance Dashboard is cardiology: it measures latency and packet loss between regions so you can spot a heart-rhythm issue across the planet. Each pillar shares the same patient record (your network configuration and telemetry) but answers a different clinical question.
Core Concepts of Connectivity Tests
A handful of terms recur across the NIC documentation and the PCNE exam questions. Knowing them precisely matters.
Static Configuration Analysis
The static analyser reads the same metadata Google's Andromeda control plane uses to program the data plane: VPC routes (subnet, static, dynamic learned via Cloud Router/BGP, peering), firewall rules and firewall policies (hierarchical, network, regional), Cloud NAT mappings, Cloud Load Balancing forwarding rules and backend services, Private Service Connect endpoints, VPN tunnels, Interconnect attachments, and PSC consumer/producer configs. It walks the path step by step from source to destination and returns either REACHABLE, UNREACHABLE, AMBIGUOUS, or UNDETERMINED. No live packet is sent in this mode.
Live Data Plane Analysis
When you enable the "Test the live data plane" option (also exposed via the --include-data-plane-analysis flag in gcloud network-management connectivity-tests rerun), Connectivity Tests injects a handful of probe packets into the actual Andromeda fabric and observes the trace. This catches divergences between configuration intent and the running data plane (for example, a stale BGP advertisement that the static analyser cannot detect). The probes are limited (typically 3–5 packets per direction) and do not measurably impact production traffic.
Source and Destination Endpoints
An endpoint can be a VM instance reference, an internal/external IP address, a GKE pod or service, a Cloud Run revision, a Cloud SQL instance, a Cloud Load Balancing forwarding rule, a network attachment, or an on-prem gateway (modelled by IP + network). The endpoint may include a port, protocol, and a networkType (gcp-network, non-gcp-network). Specifying the wrong endpoint type is the most common reason a test returns UNDETERMINED.
Trace and Packet Trace Reasons
The trace is the ordered list of steps a packet would traverse: INSTANCE → FORWARDING_RULE → VPC_NETWORK → ROUTE → FIREWALL_RULE → DELIVER (or DROP). Each step records a verdict; the final verdict on a dropped packet is the drop reason, drawn from a fixed enum that the PCNE exam loves to test directly.
Reachability Analyzer
In the console, the Connectivity Tests UI is sometimes called the Reachability Analyzer. Same engine, friendlier name. The API resource is connectivityTests; the gcloud surface is gcloud network-management connectivity-tests.
Network Analyzer
Distinct from Connectivity Tests, Network Analyzer is an always-on, opt-out service that proactively scans your VPC configuration for misconfigurations and surfaces them as findings. It catches drift that you did not think to test for.
Packet Trace Drop Reasons You Must Recognise on the Exam
The PCNE exam frequently shows you a trace excerpt and asks what to fix. The drop-reason enum is small enough to memorise.
DROPPED_DUE_TO_NO_ROUTE
The route table examined at the step had no matching prefix for the destination IP. Causes include a missing custom static route after deleting a default Internet route, a peered VPC that did not export its subnet routes, a Cloud Router that has not yet learned a route over BGP, or a hub-and-spoke design using Network Connectivity Center where the spoke is in a different routing region.
FIREWALL_BLOCK and FIREWALL_RULE
The packet matched a deny rule or matched no allow rule under the implicit deny-ingress / implicit allow-egress baselines. The trace cites the exact firewall rule name and priority. If the cited rule is default-deny-ingress, you simply have no allow rule open for that tuple; if it is a named explicit deny, you have an ordering or priority issue.
INSTANCE_NOT_RUNNING
The destination VM is STOPPED, TERMINATED, or SUSPENDED. Cheap to fix, embarrassing to discover after an hour of staring at firewalls.
LOAD_BALANCER_HAS_NO_BACKEND
The forwarding rule is configured, but the backend service has zero healthy backends. Could be a failed health check, an empty managed instance group, or a backend bucket without permissions.
PRIVATE_GOOGLE_ACCESS_DISALLOWED
The source subnet is trying to reach a *.googleapis.com private VIP (199.36.153.4-11) but Private Google Access is not enabled on the subnet, or the DNS resolution is not pointing at the private path.
Beginners often see DROPPED_DUE_TO_NO_ROUTE and rush to add a static route, only to discover the real issue is that a peered VPC has export-custom-routes disabled. The trace pinpoints the step, but the fix is sometimes one step upstream. Always read the previous trace step before adding routes.
Reference: https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/concepts/overview
CLOUD_NAT_NO_ADDRESSES and CLOUD_NAT_PORT_EXHAUSTION
A Cloud NAT gateway is either out of allocated external IPs or out of source-port tuples. The fix is to increase minPortsPerVm, switch to dynamic port allocation, or add NAT IPs.
TRAFFIC_TYPE_BLOCKED_BY_FIREWALL_POLICY
A hierarchical firewall policy (folder- or org-level) is denying the traffic before any VPC-level rule even evaluates. The trace shows the policy name and the rule priority.
NO_KNOWN_ROUTE_FROM_PEERED_NETWORK_TO_DESTINATION
VPC Network Peering is established but the destination subnet is not being exported. Toggle exportSubnetRoutesWithPublicIp and exportCustomRoutes on the peering, or check Private Service Connect peering settings.
Memorise the difference between DROPPED_DUE_TO_NO_ROUTE (your side) and NO_KNOWN_ROUTE_FROM_PEERED_NETWORK_TO_DESTINATION (peer side). The exam shows two near-identical scenarios and the answer hinges on which side owns the route.
Reference: https://cloud.google.com/vpc/docs/vpc-peering
The Four Pillars of Network Intelligence Center
NIC bundles four complementary tools under one console heading, and the exam expects you to pick the right pillar for the right question.
Connectivity Tests (Reachability Analyzer)
Reactive and proactive path-verification tool. Use it for a specific source-destination pair you can name. Outputs a trace plus a reachability verdict.
Performance Dashboard
Always-on view of latency and packet loss between every Google Cloud region pair, and between your project's VMs and Google's edge. You see median/p95/p99 for the last 7 days. Use it when latency is the symptom rather than reachability.
Firewall Insights
Analyses firewall rule usage over a configurable window (default 24 hours, up to 6 weeks). Reports unused rules, shadowed rules (a higher-priority rule fully overlaps a lower one, making it dead code), overly permissive rules (allow 0.0.0.0/0 for sensitive ports), and rules that would be hit by recent denied traffic.
Network Topology
Real-time visualisation graph of VPC entities and their traffic edges over a 6-week window. You see VMs, GKE clusters, Cloud SQL, load balancers, Cloud NAT gateways, peerings, and the byte/packet volume between them. Use it when you need to understand "what talks to what" before refactoring.
The unified observability and diagnostics surface for Google Cloud networking, comprising four pillars: Connectivity Tests (Reachability Analyzer), Performance Dashboard, Firewall Insights, and Network Topology. All four read from the same control-plane and telemetry sources but answer different operational questions. Reference: https://cloud.google.com/network-intelligence-center/docs/overview
Endpoint Modelling and How to Configure a Test
A connectivity test is defined by source, destination, protocol, and (optionally) port. Each endpoint can be described in several ways, and getting the model right is half the battle.
Source Endpoint Types
- VM instance: by
projects/{p}/zones/{z}/instances/{name}reference; the test pins to the primary or aliased IP on a chosen NIC. - GKE workload: pod IP or Service ClusterIP; the test resolves through the GKE network plug-in.
- Cloud Run / Cloud Functions: by revision URL; tests use the serverless VPC connector path if configured.
- Internal IP: free-form IP plus VPC network reference; useful for unmanaged workloads.
- External IP / FQDN: simulates inbound traffic from the public internet.
- On-prem: IP + non-GCP network label; the analyser stops at the VPN/Interconnect boundary and labels the rest
AMBIGUOUS.
Destination Endpoint Types
The same list applies, with the addition of Cloud Load Balancing forwarding rules, Cloud SQL instances (private path), Private Service Connect endpoints, and network attachments for Producer-side PSC.
gcloud Example
gcloud network-management connectivity-tests create web-to-sql \
--source-instance=projects/p1/zones/us-central1-a/instances/web-vm \
--destination-cloud-sql-instance=projects/p1/instances/orders-db \
--destination-port=5432 \
--protocol=TCP \
--include-data-plane-analysis
Re-running a Test
Configurations change daily. Tests are not auto-rerun. Schedule reruns via gcloud network-management connectivity-tests rerun from Cloud Scheduler + Cloud Functions, or use the API directly from your deployment pipeline. A common pattern is to rerun the test in CI before promoting a Terraform change.
Save your golden-path tests as Terraform resources (google_network_connectivity_test). On every infrastructure change, run terraform apply followed by an automated rerun of all tests in the project. If any verdict flips from REACHABLE to anything else, fail the deploy.
Reference: https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/how-to/running-connectivity-tests
Network Analyzer Findings and How They Surface
Network Analyzer scans your VPC continuously and posts findings to the Recommender API under category NETWORK_ANALYZER_INSIGHT. Findings are grouped by severity (CRITICAL, HIGH, MEDIUM, LOW) and lifecycle state (ACTIVE, RESOLVED, MUTED).
Common Findings
- IP utilisation high: subnet is over 80 percent allocated; risk of new-VM failure.
- Sub-optimal MTU: VPC MTU does not match the partner side of a VPN/Interconnect, forcing fragmentation.
- Asymmetric routing: BGP advertises a prefix in one direction only.
- VPN tunnel down or flapping: includes the last
IKEfailure reason. - Unused external IP: a reserved static IP not attached to any resource, billed continuously.
- Overlapping subnets in peered VPCs: silently breaks reachability.
- GKE node pool IP exhaustion: secondary range projected to fill within N days.
Surfacing Findings
Findings appear in the NIC console under "Network Analyzer" and via gcloud recommender insights list --recommender=google.networkanalyzer.vpcnetwork.connectivityInsight. Pipe critical findings into a Pub/Sub topic via Eventarc for automated paging.
Network Analyzer findings overlap with but do not replace Connectivity Tests. Findings are configuration drift signals (always on, no source/destination needed); Connectivity Tests verify a specific path you care about. The exam contrasts the two and expects you to name both correctly. Reference: https://cloud.google.com/network-intelligence-center/docs/network-analyzer/concepts/overview
VPC Flow Logs and BigQuery Analysis
VPC Flow Logs are per-5-tuple traffic samples emitted from each VM NIC at a configurable sampling rate (default 0.5 = 50 percent of flows) and aggregation interval (default 5 seconds). They are the raw evidence behind Connectivity Tests' live-data-plane verdicts and the input to most production troubleshooting playbooks.
Enabling Flow Logs
Flow logs are enabled per subnet via gcloud compute networks subnets update SUBNET --enable-flow-logs --logging-aggregation-interval=INTERVAL_5_SEC --logging-flow-sampling=0.5 --logging-metadata=INCLUDE_ALL_METADATA. The records land in Cloud Logging with logName=projects/.../logs/compute.googleapis.com%2Fvpc_flows.
Routing Flow Logs to BigQuery
Create a Cloud Logging sink with destination bigquery.googleapis.com/projects/{p}/datasets/network_logs and filter resource.type="gce_subnetwork". Partitioned daily tables make 30-day retention queries cheap.
Useful BigQuery Queries
Top talkers in the last hour:
SELECT
jsonPayload.connection.src_ip AS src,
jsonPayload.connection.dest_ip AS dst,
SUM(CAST(jsonPayload.bytes_sent AS INT64)) AS bytes
FROM `proj.network_logs.compute_googleapis_com_vpc_flows_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), timestamp, MINUTE) < 60
GROUP BY src, dst
ORDER BY bytes DESC LIMIT 50;
Denied connection attempts toward a sensitive subnet:
SELECT timestamp, jsonPayload.connection.src_ip, jsonPayload.connection.dest_port
FROM `proj.network_logs.compute_googleapis_com_vpc_flows_*`
WHERE jsonPayload.reporter = "DEST"
AND jsonPayload.dest_vpc.subnetwork_name = "payments-subnet"
AND jsonPayload.disposition = "DENIED"
ORDER BY timestamp DESC LIMIT 200;
Combining Flow Logs With Connectivity Tests
Connectivity Tests answers "should this work?" Flow logs answer "what actually happened?" The veteran workflow is to run a Connectivity Test first to validate the configuration intent, then query Flow Logs for the exact 5-tuple over the suspect window to confirm whether the production data plane behaved the same way.
Flow Logs aggregation intervals are INTERVAL_5_SEC (default), INTERVAL_30_SEC, INTERVAL_1_MIN, INTERVAL_5_MIN, INTERVAL_10_MIN, INTERVAL_15_MIN. Sampling rate ranges 0.0–1.0. Lower intervals + higher sampling = more accurate forensics but more storage cost. Production rule of thumb: 5-sec aggregation, 0.5 sampling for security-sensitive subnets; 30-sec / 0.1 for chatty backplane subnets.
Reference: https://cloud.google.com/vpc/docs/flow-logs
Performance Dashboard for Cross-Region Latency
Performance Dashboard provides two views: Google Cloud Performance (round-trip latency and packet drop between every Google Cloud region pair, as measured by Google's internal probes) and Project Performance (the same metrics scoped to your project's VMs).
Region-to-Region View
Pick two regions and see median, 5th-percentile, and 95th-percentile RTT in milliseconds plus packet drop percentage for the last 6 hours, 24 hours, or 7 days. Use it to decide whether to deploy a regional replica or accept the cross-region penalty.
Project View
Shows the same metrics for traffic between your VMs across regions or zones. Useful for detecting a misbehaving hop without instrumenting your application.
Interpreting the Numbers
A typical us-central1 ↔ us-east1 RTT is around 30 ms p50; a us-central1 ↔ asia-southeast1 RTT is roughly 175 ms p50. A sudden 50 ms jump on an intra-region path almost always means a routing or congestion incident, not a code regression. Performance Dashboard is the first place to look before you blame the application.
Cross-Reference With Network Topology
When Performance Dashboard shows degraded latency on a specific path, Network Topology lets you see which workloads sit on that path right now and how much traffic they push, so you can prioritise mitigation.
Troubleshooting Playbook Patterns
A repeatable triage flow saves hours during an outage. The community-tested playbook below is the one Google's customer engineers run in real escalations.
Step 1: Reproduce the Pair
Ask the user for the exact source and destination, including port, protocol, and timestamp. Vague reports ("the API is down") will sink you. Insist on a concrete 5-tuple.
Step 2: Run a Connectivity Test
Use the gcloud one-liner; include --include-data-plane-analysis. Read the verdict and the trace. 70 percent of incidents resolve at this step because the trace names the offending firewall rule, missing route, or stopped instance.
Step 3: Check Network Analyzer
Open the Network Analyzer findings filtered to the relevant VPC. A high-severity finding such as IP_UTILIZATION_HIGH or VPN_TUNNEL_DOWN often correlates with the live incident even when the user report did not mention it.
Step 4: Pull Flow Logs Over the Suspect Window
Run a BigQuery query scoped to the source IP and destination port for the suspect 10-minute window. Confirm whether the production data plane saw the packets at all, and what disposition was reported.
Step 5: Look at Firewall Insights
If the trace says FIREWALL_BLOCK, check Firewall Insights for the policy/rule cited. The console shows recent hit counts and lets you preview the effect of a proposed change.
Step 6: Inspect Network Topology and Performance Dashboard
If the verdict is REACHABLE but the user still reports issues, the problem is performance, not reachability. Open Performance Dashboard for the region pair and Network Topology for the workload graph. Look for latency spikes, packet drop, or unexpected hairpinning through Cloud NAT.
Step 7: Escalate With Evidence
If steps 1–6 do not resolve, file a support ticket with the Connectivity Test ID, the Flow Logs query results, the Network Analyzer finding IDs, and a Performance Dashboard screenshot. Google support resolves cases with this packet of evidence in a fraction of the time.
Wrap steps 1–4 in a single runbook script invoked from your incident channel. /diagnose src=10.0.1.5 dst=10.0.2.7 port=5432 protocol=tcp should create the connectivity test, query flow logs, dump Network Analyzer findings, and post the results to Slack in under 60 seconds. The team that does this resolves P1s twice as fast as the team that opens four browser tabs.
Reference: https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/how-to/running-connectivity-tests
IAM, Quotas, and Operational Limits
Connectivity Tests has its own permission surface and a handful of quotas worth knowing for the exam.
IAM Roles
roles/networkmanagement.admin— full control overconnectivityTestsresources.roles/networkmanagement.viewer— read-only view of tests and traces.- Granular permissions:
networkmanagement.connectivityTests.create,.get,.list,.update,.delete,.rerun.
To run a test, the principal must also have read access to all the network resources along the path (subnets, firewall rules, peerings). A common mistake is granting the network-management role without the underlying compute.networks.get and compute.firewalls.list permissions, which causes opaque UNDETERMINED verdicts.
Quotas
Up to 1,000 connectivity tests per project (soft limit, raisable). Reruns count against API request quota, not test quota. Live-data-plane probes are rate-limited per VM (a couple per minute) to prevent abuse.
Pricing
The static analyser is free. Live-data-plane probes are billed at a small per-probe rate. VPC Flow Logs storage and Cloud Logging egress are billed separately; Flow Logs to BigQuery uses the standard sink and storage rates. Performance Dashboard, Firewall Insights, and Network Topology are included in the platform.
Common Pitfalls and Trade-offs
Connectivity Tests looks straightforward in a demo and reveals sharp edges in production.
Trap: Confusing Reachability With Application Health
REACHABLE only proves the network path is open at L3/L4. The destination may still 500 every request because the application is broken. After a REACHABLE verdict, the next step is application-layer diagnosis, not declaring victory.
Trap: Trusting a Stale Test
A test result is a snapshot. A test you ran yesterday is invalidated by today's firewall change. Always rerun before drawing conclusions, and prefer scheduled reruns in CI.
Trap: OS-Level Firewalls Hidden From the Inspector
Connectivity Tests sees Google Cloud firewall objects but not iptables, ufw, firewalld, or Windows Defender rules running inside the VM guest OS. A REACHABLE verdict followed by a refused connection often means the guest OS is the gatekeeper.
Trade-off: Sampling Rate vs Storage Cost
Flow Logs at 100 percent sampling on a chatty subnet can cost hundreds of dollars a day in Cloud Logging ingest. Tune sampling per subnet based on sensitivity; payments subnets at 1.0, batch processing at 0.1, dev at 0.05.
Trade-off: Network Analyzer Noise
Network Analyzer can produce dozens of low-severity findings. Filter aggressively to severity HIGH/CRITICAL for paging; review the rest weekly.
FAQs
Q: Does running a Connectivity Test impact production traffic?
A: Static analysis sends no packets and impacts nothing. Live-data-plane mode sends a handful of probe packets (typically 3–5 per direction) that are rate-limited and indistinguishable from background traffic. There is no measurable impact on application throughput or latency.
Q: Can a Connectivity Test verify traffic through a Cloud NAT gateway?
A: Yes. The trace explicitly models Cloud NAT as a step and will report CLOUD_NAT_NO_ADDRESSES or CLOUD_NAT_PORT_EXHAUSTION when applicable. It also models Cloud Load Balancing, Cloud VPN, Interconnect attachments, Private Service Connect, and VPC Network Peering.
Q: What is the difference between Reachability Analyzer and Network Analyzer?
A: Reachability Analyzer (the console name for Connectivity Tests) verifies a specific source-destination path you define. Network Analyzer is an always-on scanner that surfaces misconfigurations across your VPC without requiring you to name a path. Use the first when you have a specific question; rely on the second for continuous drift detection.
Q: Why does my test return UNDETERMINED instead of REACHABLE or UNREACHABLE?
A: The most common causes are: (1) the principal running the test lacks read permission on a resource along the path; (2) the source or destination crosses into a non-GCP network the analyser cannot see (on-prem firewalls, partner SaaS); (3) the endpoint reference is ambiguous (for example an IP that belongs to multiple subnets in different projects). Fix the permissions or sharpen the endpoint, then rerun.
Q: Can I export VPC Flow Logs to BigQuery in real time?
A: Yes, via a Cloud Logging sink with a BigQuery destination. Logs land in partitioned tables typically within 1–2 minutes of generation. For lower-latency analysis, sink to Pub/Sub instead and stream into BigQuery via Dataflow or directly into a streaming buffer.
Q: Are Performance Dashboard latency numbers measured from my VMs or from Google's probes?
A: Both. The "Google Cloud Performance" view uses Google's internal probe infrastructure (continuous probes between every region pair). The "Project Performance" view computes the same metrics from telemetry tied to your project's VMs. Compare both: if Project Performance is worse than Google Cloud Performance for the same region pair, your workload or VPC configuration is the bottleneck, not the underlying fabric.
Q: How long are Flow Logs retained by default?
A: Cloud Logging default retention is 30 days. Sink them to BigQuery (cheap long-term storage) or Cloud Storage Coldline for compliance retention of a year or more. The Flow Logs themselves never live in NIC; NIC reads them from Cloud Logging on demand.