We’re experiencing intermittent packet loss between two VPC networks connected via VPC peering after updating our firewall rules last week. The issue affects our production microservices communication.
Our setup uses VPC peering between vpc-prod-us-central and vpc-services-us-central. We recently added new firewall rules to restrict traffic, and now we see 5-15% packet loss during peak hours. VPC Flow Logs show some packets being dropped, but the firewall rules appear correctly configured with proper priority values.
Current firewall rule priorities:
allow-internal-services: priority 1000
allow-peered-traffic: priority 1100
deny-all-ingress: priority 2000
The packet loss is inconsistent - sometimes connections work fine for hours, then suddenly degrade. We’ve verified the peering connection status shows active on both sides. Has anyone dealt with firewall rule precedence issues in VPC peering scenarios?
This is a classic firewall precedence issue. Remember that in GCP, lower priority numbers take precedence. Your allow-peered-traffic at 1100 is being evaluated AFTER the allow-internal-services at 1000. If the internal services rule has a narrow IP range that doesn’t include your peered VPC CIDR, traffic will continue down the chain. Make sure your peered traffic rule has a lower priority number than any restrictive rules and explicitly includes the entire CIDR range of both peered VPCs.
We had similar issues last month. Beyond firewall rules, check your subnet routes. VPC peering automatically creates routes, but if you have custom static routes they might interfere. Run ‘gcloud compute routes list’ on both projects and look for conflicts. Also enable detailed VPC Flow Logs sampling at 100% temporarily to capture all dropped packets - the default 10% sampling might miss the pattern.
I see what happened here. You’re mixing target-based rules with IP-based peering, which creates gaps in coverage. Here’s the comprehensive solution:
VPC Peering Configuration Fix:
First, understand that VPC peering works at the network level, not the instance level. Your firewall rules must accommodate this:
- Firewall Rule Precedence - Restructure your rules with proper priority ordering:
# Priority 900 - Explicit allow for peered VPC CIDR ranges
gcloud compute firewall-rules create allow-vpc-peering \
--priority=900 \
--source-ranges=10.128.0.0/20,10.138.0.0/20 \
--allow=tcp,udp,icmp
- VPC Flow Logs Analysis - Enable detailed logging to identify exact drop points:
gcloud compute networks subnets update prod-subnet \
--enable-flow-logs \
--logging-aggregation-interval=interval-5-sec \
--logging-flow-sampling=1.0
Key insights from your situation:
Root Cause: Your allow-peered-traffic rule at priority 1100 uses service account targeting, but VPC peering routes don’t carry service account context. This means traffic from peered VPCs never matches that rule and falls through to your deny-all at 2000.
The Fix Strategy:
- Remove service account targets from peering-related rules
- Use source IP ranges matching your peered VPC CIDRs instead
- Set priority below 1000 (I recommend 900) to ensure evaluation before other rules
- Create symmetric rules in BOTH VPCs - ingress in one must match egress in the other
Verification Steps:
- After updating rules, test with ping and traceroute from instances in both VPCs
- Monitor VPC Flow Logs for disposition=ALLOWED on previously dropped flows
- Check packet loss with:
ping -c 100 <peered-instance-ip> and verify 0% loss
- Use
gcloud compute firewall-rules list --sort-by=priority to confirm rule order
Additional Recommendations:
- Document your peering firewall rules separately from instance-level rules
- Use consistent naming: prefix peering rules with ‘peer-’ for clarity
- Set up alerting on VPC Flow Log drops with disposition=DENIED for peered traffic
- Consider using Firewall Insights to identify overly permissive or shadowed rules
The intermittent nature you described (working fine, then degrading) suggests that during low traffic periods, connections were being established before hitting the deny rule, but under load, new connections were failing. This is typical when firewall rules don’t properly account for bidirectional traffic in peered networks.
Thanks for the suggestion. I checked both VPCs and found that vpc-services-us-central has an egress deny rule at priority 1050 that I wasn’t aware of. This could be blocking return traffic. The VPC Flow Logs are showing dropped packets with disposition DENIED in the egress direction from the services VPC. I’m going to adjust the rule priorities to ensure peered traffic is allowed before any deny rules take effect. Will update once I test this.
Good catch on the sampling rate. I increased Flow Logs to 100% and now I can see the exact pattern. Packets are being dropped specifically when they match the deny-all-ingress rule at priority 2000, which means they’re not matching our allow rules properly. The issue is that our allow-peered-traffic rule uses a service account target, but some of our instances don’t have that service account attached.