Excellent work identifying the issues. Let me provide a comprehensive solution for IPsec tunnel stability between Azure VPN Gateway and branch office firewalls:
Primary Issues Identified:
- IPsec policy mismatch - SA lifetime differed (Azure: 3600s, ASA: 28800s)
- DPD timeout mismatch - ASA too aggressive (30s) vs Azure (45s)
- Potential NAT-T UDP session timeout on intermediate firewall
Complete Solution:
1. Align IPsec Phase 2 (ESP) Policies:
Both sides must match exactly:
Encryption: AES256-CBC
Integrity: SHA256
PFS Group: DHGroup14 (PFS2048)
SA Lifetime: 3600 seconds (1 hour)
Data Lifetime: 102400000 KB
2. Configure Dead Peer Detection (DPD):
Set consistent DPD on Cisco ASA:
crypto ikev2 policy 10
lifetime 86400
crypto ipsec ikev2 ipsec-proposal AZURE-PROPOSAL
protocol esp encryption aes-256
protocol esp integrity sha-256
tunnel-group <Azure-VPN-IP> ipsec-attributes
ikev2 remote-authentication pre-shared-key <PSK>
ikev2 local-authentication pre-shared-key <PSK>
isakmp keepalive threshold 10 retry 2
DPD effective timeout: 45 seconds (matches Azure)
3. Address NAT-T Session Timeout:
If intermediate firewall exists, two approaches:
A) Increase UDP 4500 timeout on Fortinet:
config system session-ttl
set default 3600
end
B) Generate keepalive traffic from Azure:
Create a Logic App or Function App to ping branch subnet every 60 seconds, maintaining tunnel activity.
4. Enable VPN Diagnostics on Azure:
az network vnet-gateway vpn-client generate \
--resource-group rg-network \
--name vpn-gateway-prod \
--processor-architecture Amd64
Download logs to analyze IKE negotiations and identify exact failure points.
5. Verify Firewall Timeout Settings:
On Cisco ASA, ensure idle timeout doesn’t conflict:
timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00
Change UDP timeout if needed: `timeout udp 1:00:00
6. Implement Connection Monitor:
Use Azure Network Watcher to continuously monitor tunnel health:
- Create Connection Monitor targeting branch subnet
- Set alert threshold for packet loss > 5%
- Configure notification for tunnel state changes
Verification Steps:
- Clear existing IPsec SAs on both sides
- Re-establish tunnel and verify policy match:
show crypto ipsec sa on ASA
- Check Azure VPN Gateway metrics for ‘Tunnel Ingress/Egress Bytes’
- Monitor for 48 hours - tunnel should remain stable
- Simulate idle period (no traffic for 30 minutes) and verify tunnel stays up
Common Gotchas:
- Azure uses policy-based selectors (0.0.0.0/0) - ensure ASA crypto ACL matches
- BGP over IPsec requires additional keepalive configuration
- Multiple branch offices need unique local/remote networks in crypto ACLs
- Azure VPN Gateway SKU affects throughput and tunnel count
Best Practices:
- Use IKEv2 over IKEv1 for better stability and faster reconnection
- Document all IPsec parameters in configuration management
- Set up automated alerts for tunnel state changes
- Implement redundant tunnels using active-active VPN Gateway if critical
The key insight: IPsec tunnel flapping is almost always due to policy mismatches (SA lifetime, DPD, encryption/integrity) or intermediate firewall session timeouts. Azure’s VPN diagnostics logs are invaluable for pinpointing the exact negotiation failure. Once policies align and NAT-T sessions are preserved, tunnels remain stable even under low traffic conditions.