ExpressRoute circuit routing conflict with custom BGP configuration causes intermittent ERP connectivity

We’re experiencing intermittent connectivity loss to our ERP system through ExpressRoute after implementing custom BGP configurations. The circuit was working fine with default settings, but we needed custom route advertisements for our multi-region setup.

Our custom BGP config includes:


as-path prepend 65001 65001
route-map CUSTOM_OUT permit 10
match ip address prefix-list ERP_ROUTES
set as-path prepend 65001

The issue manifests as random ERP connection drops lasting 30-90 seconds, particularly during peak hours. Route diagnostics show conflicting paths, but I’m not sure how to interpret the BGP neighbor status correctly. Has anyone dealt with similar BGP routing conflicts on ExpressRoute circuits? We need to maintain the custom config for failover purposes but can’t afford these disruptions.

We had nearly identical issues last year. The key is BGP community tags for route control rather than aggressive AS-path prepending. Also, make sure you’re not advertising more specific routes than necessary - that causes unnecessary BGP churn. What’s your current BGP timer configuration? Default 60/180 keepalive/holdtime can be too slow for detecting issues.

I’ve seen this before. The AS-path prepending in your config might be causing route flapping. Check your BGP session stability first - run Get-AzExpressRouteCircuitStats to see if you’re hitting any BGP session resets. Also, verify that your on-premises router isn’t also doing AS-path manipulation, which would create conflicting advertisements.

The BGP flapping is definitely your culprit. I’d recommend checking the route-map sequence numbers - if your on-prem and Azure configs have overlapping or conflicting sequence numbers, BGP will constantly recalculate best paths. Also verify your prefix-list ERP_ROUTES isn’t too broad and accidentally advertising default routes or overlapping subnets. That can cause Azure’s route selection algorithm to thrash between paths.

For failover testing without disruption, set up a test VM in a separate VNet peered to your ExpressRoute gateway. Use that to validate routing behavior while monitoring with Network Watcher’s Connection Monitor. You can simulate failures by adjusting BGP weights temporarily on the test path. Just make sure your Connection Monitor is configured to track both primary and backup paths so you can see the switchover timing.

Let me provide a comprehensive solution that addresses all three key areas: custom BGP config, route diagnostics, and failover testing.

Custom BGP Configuration Fix: Your AS-path prepending is too aggressive and creating instability. Refactor your route-map to use BGP communities instead:


route-map CUSTOM_OUT permit 10
match ip address prefix-list ERP_ROUTES_SPECIFIC
set community 65001:100

Make your prefix-list more specific to avoid overlaps - only advertise the exact ERP subnets needed (e.g., 10.50.0.0/24 for ERP app servers, 10.50.1.0/24 for database tier) rather than broader ranges.

Route Diagnostics: Implement continuous monitoring using these tools:

  1. Enable ExpressRoute Circuit Metrics in Azure Monitor - set alerts on BGP session state changes and route count fluctuations
  2. Use Get-AzExpressRouteCircuitRouteTable regularly to verify advertised routes match expectations
  3. Configure Network Performance Monitor to track latency and packet loss on ExpressRoute paths
  4. Check for asymmetric routing with Get-AzExpressRouteCircuitRouteTableSummary - this often causes the 30-90 second drops you’re seeing

Failover Testing Strategy: Set up a proper testing framework:

  1. Create a staging VNet with identical routing config but separate ExpressRoute gateway
  2. Deploy Connection Monitor with test endpoints that simulate ERP traffic patterns
  3. Tune BGP timers to 10/30 (keepalive/holdtime) for faster convergence - default 60/180 is too slow
  4. Test failover by adjusting local-preference values (use 150 for primary, 100 for backup) rather than AS-path prepending
  5. Validate convergence time is under 10 seconds using continuous ping tests

The root cause of your 30-90 second outages is the combination of overlapping prefix advertisements and slow BGP convergence. Once you implement more specific prefix-lists and faster BGP timers, failover should complete in under 15 seconds. Monitor for 48 hours after changes to confirm stability before considering the issue resolved.