ExpressRoute connection causes slow ML data transfer between on-premises ERP and Azure storage

nicolecoder · November 13, 2024, 3:30pm

We’re experiencing very slow data transfer speeds when our Azure ML pipeline pulls training data from our on-premises ERP system to Azure Blob Storage via ExpressRoute. The connection is a 1 Gbps ExpressRoute circuit, but we’re only seeing 80-120 Mbps throughput during ML data transfers, which is causing significant delays in our pipeline execution. Normal application traffic over the same ExpressRoute circuit seems fine, hitting 600-700 Mbps during business hours. The slow transfer specifically affects our nightly ML data sync job that moves 200-300 GB of transactional data from ERP database to Azure storage for model training. This job is now taking 6-8 hours instead of the expected 2-3 hours, which pushes into business hours and impacts our production ERP performance. We’ve checked the ExpressRoute circuit metrics in Azure Monitor and the overall bandwidth utilization shows we’re not maxing out the circuit, so I’m wondering if there’s a QoS or routing configuration issue that’s throttling the ML data transfer specifically. What could be causing this performance degradation for large data transfers while normal traffic flows normally?

anthonywizard · December 6, 2024, 5:52pm

That makes sense about the TCP limitations. How do I configure the Azure SDK for parallel transfers? And beyond that, are there any QoS settings I should configure on the ExpressRoute connection to prioritize this data transfer traffic during the nightly sync window? We have some flexibility to mark traffic with DSCP values on our on-premises side if that would help.

jeffreysolver · November 26, 2024, 5:51pm

Private Peering is correct for accessing Azure services via private IPs. The latency is good. Single-threaded transfers over ExpressRoute often hit exactly the performance wall you’re seeing - around 100-150 Mbps even on a 1 Gbps circuit. This is due to TCP congestion control algorithms and bandwidth-delay product limitations. The Azure SDK by default uses single-threaded uploads for blob storage unless you explicitly enable concurrent connections. You need to configure the SDK to use parallel transfers with multiple TCP streams. For 200-300 GB transfers, aim for 8-16 concurrent connections to better utilize the available bandwidth.

jonathantech · November 13, 2024, 6:19pm

The symptom you’re describing - normal application traffic performing well while large data transfers are slow - suggests a few possible issues. First, check your ExpressRoute SKU and peering configuration. Are you using Microsoft Peering or Private Peering? For Azure Storage access, you should be using Microsoft Peering with route filters. Second, what’s the MTU size configured on your on-premises network equipment? ExpressRoute supports up to 1500 byte MTU, but if your equipment is set lower or there’s fragmentation happening, that could significantly impact large data transfers. Third, how many TCP connections is your ML data sync job using? Single-threaded transfers won’t fully utilize available bandwidth due to TCP window scaling limitations over high-latency connections.

Topic		Views
Optimizing Azure Analytics data lake network performance for ETL Microsoft Azure discussion , monitoring , networking , analytics , etl , performance , az-2020 , azure-data-lake , expressroute	5	August 16, 2025
Azure Storage network optimization strategies for hybrid environments Microsoft Azure discussion , storage , networking , optimization , hybrid-cloud , az-2021 , data-transfer , azure-storage , storage-sync	4	September 26, 2025
Optimizing Azure Data Lake network performance for large-scale ETL operations Microsoft Azure discussion , networking , analytics , etl , performance , az-2020 , azure-data-lake , expressroute , data-partitioning	5	December 4, 2025
Azure VNet peering causes high latency between regions for real-time analytics workloads Microsoft Azure question , networking , analytics , performance , az-2021 , latency , multi-region , azure-vnet-peering	6	July 20, 2025
Azure VNet peering causes high latency between regions for real-time analytics Microsoft Azure question , networking , analytics , performance , az-2021 , latency , multi-region , azure-vnet-peering	7	November 22, 2024
Monitoring network latency impact on ERP performance: tools and approaches Microsoft Azure discussion , observability , az-2019 , latency , performance-monitoring , azure-monitor , network-watcher , connection-monitor	7	July 5, 2025
ExpressRoute circuit routing conflict with custom BGP configuration causes intermittent ERP connectivity Microsoft Azure question , networking , routing , az-2021 , connectivity-issue , expressroute , bgp , failover	5	October 22, 2025
Monitoring network latency impact on ERP performance: tools and metrics Microsoft Azure discussion , monitoring , performance , observability , az-2019 , latency , azure-monitor , network-watcher	5	December 28, 2024
ExpressRoute private peering at edge site not routing traffic to Azure resources Microsoft Azure question , networking , edge-computing , az-2019 , expressroute , routing-failure , bgp , private-peering , branch-office-connec	5	February 8, 2025

ExpressRoute connection causes slow ML data transfer between on-premises ERP and Azure storage

Related topics