VPN Gateway IPsec tunnel flapping between Azure and branch office firewall

We’re experiencing intermittent VPN tunnel drops between our Azure VPN Gateway and branch office Cisco ASA firewall. The tunnel establishes successfully but goes down after 15-30 minutes, then reconnects. This is causing major issues with our ERP system synchronization.

Azure VPN Gateway is VpnGw1 SKU with route-based configuration. The tunnel uses IKEv2 with the following phase 2 settings:


Encryption: AES256
Integrity: SHA256
PFS Group: PFS2048
SA Lifetime: 3600 seconds

Looking at Azure diagnostics, I see ‘IKE_TIMED_OUT’ errors around the time of disconnections. The Cisco ASA logs show ‘Phase 2 SA deleted due to inactivity’. We have NAT-T enabled on both sides. Branch office has 50Mbps internet with stable connectivity to other sites.

Could this be an IPsec policy mismatch or is there something with keepalive/DPD settings that needs adjustment?

Good point about the firewall timeout. Our branch office has a Fortinet firewall in front of the ASA that could be timing out the NAT-T session. I’ll check the UDP timeout settings there. Meanwhile, I’ve updated the ASA IPsec policy to match Azure’s 3600-second SA lifetime and set DPD to 45 seconds. Going to monitor for the next 24 hours to see if the flapping stops.

I checked the ASA configuration and found DPD is set to 10 seconds retry with 3 retries (30 seconds total). Azure is 45 seconds. Should I increase the ASA DPD timeout to match Azure, or lower Azure’s timeout? Also, the SA lifetime on ASA side is set to 28800 seconds (8 hours) while Azure shows 3600 seconds (1 hour). This seems like a significant mismatch.

Excellent work identifying the issues. Let me provide a comprehensive solution for IPsec tunnel stability between Azure VPN Gateway and branch office firewalls:

Primary Issues Identified:

  1. IPsec policy mismatch - SA lifetime differed (Azure: 3600s, ASA: 28800s)
  2. DPD timeout mismatch - ASA too aggressive (30s) vs Azure (45s)
  3. Potential NAT-T UDP session timeout on intermediate firewall

Complete Solution:

1. Align IPsec Phase 2 (ESP) Policies: Both sides must match exactly:


Encryption: AES256-CBC
Integrity: SHA256
PFS Group: DHGroup14 (PFS2048)
SA Lifetime: 3600 seconds (1 hour)
Data Lifetime: 102400000 KB

2. Configure Dead Peer Detection (DPD): Set consistent DPD on Cisco ASA:


crypto ikev2 policy 10
  lifetime 86400
crypto ipsec ikev2 ipsec-proposal AZURE-PROPOSAL
  protocol esp encryption aes-256
  protocol esp integrity sha-256
tunnel-group <Azure-VPN-IP> ipsec-attributes
  ikev2 remote-authentication pre-shared-key <PSK>
  ikev2 local-authentication pre-shared-key <PSK>
  isakmp keepalive threshold 10 retry 2

DPD effective timeout: 45 seconds (matches Azure)

3. Address NAT-T Session Timeout: If intermediate firewall exists, two approaches:

A) Increase UDP 4500 timeout on Fortinet:


config system session-ttl
  set default 3600
end

B) Generate keepalive traffic from Azure:

Create a Logic App or Function App to ping branch subnet every 60 seconds, maintaining tunnel activity.

4. Enable VPN Diagnostics on Azure:


az network vnet-gateway vpn-client generate \
  --resource-group rg-network \
  --name vpn-gateway-prod \
  --processor-architecture Amd64

Download logs to analyze IKE negotiations and identify exact failure points.

5. Verify Firewall Timeout Settings: On Cisco ASA, ensure idle timeout doesn’t conflict:


timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00

Change UDP timeout if needed: `timeout udp 1:00:00 6. Implement Connection Monitor: Use Azure Network Watcher to continuously monitor tunnel health:

  • Create Connection Monitor targeting branch subnet
  • Set alert threshold for packet loss > 5%
  • Configure notification for tunnel state changes

Verification Steps:

  1. Clear existing IPsec SAs on both sides
  2. Re-establish tunnel and verify policy match: show crypto ipsec sa on ASA
  3. Check Azure VPN Gateway metrics for ‘Tunnel Ingress/Egress Bytes’
  4. Monitor for 48 hours - tunnel should remain stable
  5. Simulate idle period (no traffic for 30 minutes) and verify tunnel stays up

Common Gotchas:

  • Azure uses policy-based selectors (0.0.0.0/0) - ensure ASA crypto ACL matches
  • BGP over IPsec requires additional keepalive configuration
  • Multiple branch offices need unique local/remote networks in crypto ACLs
  • Azure VPN Gateway SKU affects throughput and tunnel count

Best Practices:

  • Use IKEv2 over IKEv1 for better stability and faster reconnection
  • Document all IPsec parameters in configuration management
  • Set up automated alerts for tunnel state changes
  • Implement redundant tunnels using active-active VPN Gateway if critical

The key insight: IPsec tunnel flapping is almost always due to policy mismatches (SA lifetime, DPD, encryption/integrity) or intermediate firewall session timeouts. Azure’s VPN diagnostics logs are invaluable for pinpointing the exact negotiation failure. Once policies align and NAT-T sessions are preserved, tunnels remain stable even under low traffic conditions.

One more thing to check - your branch office firewall might have an idle timeout for the NAT-T UDP 4500 session. If there’s no traffic for a certain period, the firewall’s stateful inspection drops the UDP session, which breaks the IPsec tunnel even though both VPN endpoints think it’s up. This is common with low-traffic tunnels. You can solve this by enabling ‘UsePolicyBasedTrafficSelectors’ on Azure side if the Cisco ASA is policy-based, or configure a keepalive/interesting traffic generator.