VPC peering causes ERP app timeouts between regions after network expansion

We recently expanded our ERP deployment to use VPC peering between us-central1 and europe-west1 regions. After establishing the peering connection, our ERP application components started experiencing intermittent timeouts when accessing services across regions.

The VPC peering routes appear to be configured correctly in both directions, but we’re seeing 5-10 second delays followed by connection timeouts. Our firewall rules allow traffic on the required ports (8080, 8443, 5432 for PostgreSQL), but I’m wondering if there’s something specific about cross-region peering that we’re missing.

I’ve enabled VPC Flow Logs but I’m not entirely sure what patterns to look for in the logs that would indicate routing or firewall issues. The timeouts happen most frequently during peak business hours (09:00-17:00 CET), affecting order processing and inventory lookups.

Has anyone dealt with similar cross-region peering issues in ERP deployments? What diagnostic steps should I prioritize?

Jenny, that was a great catch! I found that our ERP application servers in europe-west1 had the tag ‘erp-app-eu’ but our firewall rules were targeting ‘erp-app’ globally. The us-central1 instances had the correct tags.

After reviewing the routes with Raj’s command, I also discovered we had a custom route with priority 900 that was conflicting with the auto-generated peering route (priority 1000). This was causing some traffic to route through our Cloud NAT gateway instead of directly through the peering connection, adding massive latency.

To fully resolve VPC peering timeout issues between regions, you need to address all three focus areas systematically:

VPC Peering Routes Configuration: First, verify your peering routes are properly established and not being overridden. The automatic routes created by VPC peering should have priority 1000. Check for conflicts:


gcloud compute routes list --filter="network:YOUR_VPC"

Look for custom routes with priority <1000 that overlap with your peered VPC CIDR ranges. Delete or modify these routes to have lower priority (higher number) than 1000. In your case, the priority 900 custom route was forcing traffic through Cloud NAT instead of the direct peering path.

Firewall Rules for Cross-Region Traffic: Your firewall rules must cover the entire peered VPC CIDR block, not individual subnets. Create explicit ingress and egress rules:


gcloud compute firewall-rules create allow-peered-vpc-ingress \
  --network=YOUR_VPC \
  --allow=tcp:8080,tcp:8443,tcp:5432 \
  --source-ranges=10.132.0.0/16 \
  --target-tags=erp-app

Critically, ensure your instance network tags match the target-tags in firewall rules. Mismatched tags (like ‘erp-app-eu’ vs ‘erp-app’) will block traffic even when VPC-level rules appear correct. Use consistent tagging across all regions.

VPC Flow Logs Analysis: Enable VPC Flow Logs with aggregation interval of 5 seconds for detailed diagnostics:


gcloud compute networks subnets update SUBNET_NAME \
  --enable-flow-logs \
  --logging-aggregation-interval=interval-5-sec

In Cloud Logging, query for timeout patterns:


resource.type="gce_subnetwork"
jsonPayload.connection.dest_ip="TARGET_IP"
jsonPayload.rtt_msec>5000

High RTT values (>100ms between us-central1 and europe-west1) indicate routing problems. Compare src_location and dest_location fields to verify traffic is taking the direct peering path, not routing through NAT gateways or other intermediate hops.

Additional Validation: After fixing routes, firewall rules, and instance tags, test connectivity using internal IPs: gcloud compute ssh INSTANCE --internal-ip --tunnel-through-iap. Monitor for 24-48 hours during peak traffic to confirm timeouts are resolved. Expected cross-region latency should stabilize at 80-120ms with no timeouts.

The combination of conflicting custom routes and mismatched network tags was causing your ERP application to experience both routing delays and firewall blocks, manifesting as the 5-10 second timeouts during peak hours.

We hit this exact issue last year. The problem wasn’t the peering itself but the instance-level firewall tags. Our ERP instances had network tags that weren’t included in the cross-region firewall rules. Even though VPC-level rules existed, the instance tags were blocking the traffic.

Check gcloud compute instances describe [INSTANCE_NAME] --zone=[ZONE] and verify the network tags match what’s in your firewall rule target tags. Also make sure you’re not hitting any organizational policy constraints that restrict cross-region traffic.

That RTT pattern suggests asymmetric routing or MTU issues. VPC peering should have <100ms latency between those regions. Check if you have any custom routes that might be overriding the automatic peering routes. Run gcloud compute routes list --filter="nextHopPeering:*" to see your peering routes.

Also, verify MTU settings. If you’re using VPN or interconnect alongside peering, MTU mismatches can cause fragmentation and retransmits that manifest as timeouts. GCP VPC default MTU is 1460 bytes, but peered networks might need explicit MTU configuration if you’re running encapsulated protocols.

For anyone else troubleshooting VPC peering issues, here’s a quick diagnostic sequence: 1) Verify peering status is ACTIVE in both VPCs, 2) Check that custom routes don’t have higher priority than peering routes (lower number = higher priority), 3) Confirm firewall rules use full VPC CIDR blocks not just subnets, 4) Validate instance network tags match firewall rule targets, 5) Test with gcloud compute ssh using internal IPs to isolate application vs network issues.

Also recommend enabling Packet Mirroring temporarily on the problematic instances to capture actual packet flows and see exactly where drops occur.

First thing to check: are your firewall rules using the correct IP ranges for the peered VPC? Common mistake is allowing only subnet ranges instead of the entire VPC CIDR block. Also verify that both VPCs have ingress AND egress rules configured - peering requires bidirectional rules.

For VPC Flow Logs, filter by connection state ‘TIMEOUT’ or ‘REJECTED’. Look at the src_ip and dest_ip fields to identify which direction is failing. The RTT (round-trip time) field will show you latency spikes before timeouts occur.

Thanks Sara. I checked the firewall rules and found we were only allowing the specific subnet ranges, not the full VPC CIDR. I’ve updated the rules to include the entire peered VPC range (10.128.0.0/16 for us-central1 and 10.132.0.0/16 for europe-west1).

In the VPC Flow Logs, I’m seeing entries with connection_state=‘OK’ but RTT values ranging from 150ms to 8000ms. The 8-second RTT entries correlate with our timeout reports. Most concerning is that these high-RTT flows show src_location=‘us-central1’ but the packets seem to be taking unexpected paths.