OSS multipart upload fails for large files with timeout errors and incomplete uploads

linda_cloud · May 18, 2025, 3:08pm

We’re experiencing consistent failures when uploading large backup files (over 5GB) to OSS using the Java SDK multipart upload feature. The uploads fail after 30-40% completion with connection reset errors, and we’re losing critical backup data.

Our backup automation uses OSS SDK 3.10.2 with multipart upload configured for 100MB part size. The timeout errors occur randomly during large file transfers:

OSSClient client = new OSSClient(endpoint, accessKeyId, accessKeySecret);
UploadFileRequest request = new UploadFileRequest(bucketName, objectKey);
request.setUploadFile(localFile);
request.setPartSize(100 * 1024 * 1024);
client.uploadFile(request);

The connection resets happen unpredictably, and we can’t resume from where it failed. We’ve tried adjusting SDK configuration and network settings, but large file backup remains unreliable. Has anyone successfully implemented resumable upload for files this size? What SDK configuration works best for stable multipart uploads over slower connections?

linda_cloud · May 28, 2025, 12:45pm

The SDK handles resume automatically when you provide a checkpoint directory. Just make sure the path is writable and persistent. Another thing - check your client-side network settings. We had issues where our firewall was dropping idle connections after 60 seconds. You might need to configure TCP keepalive or adjust your infrastructure timeout settings. For production backup systems, we also added retry logic with exponential backoff at the application level, separate from SDK retries.

scottcoder · June 10, 2025, 1:31am

Here’s a comprehensive solution addressing all aspects of reliable multipart upload for large files:

1. Enable Resumable Upload with Checkpoint

The key to handling connection resets is enabling checkpoint-based resumable upload:

UploadFileRequest request = new UploadFileRequest(bucketName, objectKey);
request.setUploadFile(localFile);
request.setPartSize(50 * 1024 * 1024); // 50MB parts
request.setEnableCheckpoint(true);
request.setCheckpointFile("/var/backup/checkpoints/" + objectKey + ".checkpoint");

The SDK automatically saves progress to the checkpoint file. If upload fails, calling uploadFile() again with the same checkpoint path resumes from the last successful part - no manual logic needed.

2. Optimize SDK Configuration for Large Files

Increase timeout values to match your network conditions:

ClientConfiguration config = new ClientConfiguration();
config.setConnectionTimeout(300000); // 5 minutes
config.setSocketTimeout(300000); // 5 minutes
config.setMaxErrorRetry(5);
OSSClient client = new OSSClient(endpoint, accessKeyId, accessKeySecret, config);

3. Adjust Part Size Based on Network Stability

For unstable connections, smaller parts (20-50MB) reduce retry overhead. For stable high-bandwidth links, larger parts (100-500MB) improve throughput. Your 100MB parts are reasonable for most scenarios, but try 50MB if you’re still seeing frequent resets.

4. Implement Application-Level Retry Logic

int maxRetries = 3;
for (int attempt = 0; attempt < maxRetries; attempt++) {
    try {
        UploadFileResult result = client.uploadFile(request);
        break; // Success
    } catch (Exception e) {
        if (attempt == maxRetries - 1) throw e;
        Thread.sleep((long) Math.pow(2, attempt) * 1000); // Exponential backoff
    }
}

5. Monitor and Verify Uploads

After successful upload, verify the object:

ObjectMetadata metadata = client.getObjectMetadata(bucketName, objectKey);
long uploadedSize = metadata.getContentLength();
// Compare with local file size

Network Considerations:

Check for intermediate proxies or firewalls with connection timeout policies
Enable TCP keepalive if your network drops idle connections
Consider using OSS transfer acceleration for cross-region uploads
Test with different part sizes to find optimal balance for your bandwidth

Bucket Policy: Ensure your RAM user has PutObject permission and no bucket policies restrict large uploads. Some policies set max object size limits.

With these configurations, we successfully upload 50GB+ backup files with automatic resume on any network interruption. The checkpoint mechanism is robust - even if your application crashes, restarting with the same checkpoint file continues from the last completed part. For production backup systems, this approach has given us 99.9% success rates even over unreliable WAN connections.

thomas_lead · June 2, 2025, 12:34pm

One more consideration - monitor your upload bandwidth and set realistic timeout values based on actual transfer speeds. If you’re uploading 100MB parts over a 10Mbps connection, that’s 80+ seconds per part minimum. Default timeouts are often too short for this math. Calculate expected time per part and add 50% buffer for your timeout configuration.

brandon_arch · June 2, 2025, 4:05am

I see you’re using uploadFile() which is good, but your configuration is incomplete for large file scenarios. Let me share what we use for 10GB+ backup files with high success rates.

Topic		Replies	Views
Cloud Storage REST API upload fails for large files with timeout errors Google Cloud Platform (GCP) question , storage , timeout , rest-api , backup , gcp-2021 , file-upload , cloud-storage , apis	6	0	November 11, 2025
Object Storage upload fails for large files from OCI Compute instance Oracle Cloud question , compute , storage , oci-2019 , multipart-upload , upload-fail , oci-object-storage , oci-cli , network-bandwidth	6	0	June 26, 2025
S3 REST API multipart upload stalls at 99% for files over 5GB during nightly backup Amazon Web Services (AWS) question , storage , networking , rest-api , backup , aws-2021 , multipart-upload , apis , http	3	0	May 24, 2025
Object Storage API upload fails with 'Entity Too Large' error for files over 5GB in storage module Oracle Cloud question , storage , rest-api , backup , object-storage , oci-2019 , multipart-upload , apis , large-files	5	0	December 29, 2024
Blob storage ML data ingestion fails for large files with timeout and partial read errors Microsoft Azure question , data-integrity , storage , timeout , az-2021 , machine-learning , blob-storage , large-files , retry-policy	5	0	January 13, 2025
Object Storage performance tuning for ERP large file transfers: multipart upload, throughput, and cost IBM Cloud discussion , storage , networking , throughput , object-storage , ic-2021 , performance-tuning , multipart-upload , file-transfer	7	1	July 2, 2025
Pre-signed URL for object storage API expires prematurely during large file uploads Oracle Cloud question , storage , rest-api , object-storage , oci-2019 , multipart-upload , large-files , presigned-url	6	0	July 31, 2025
REST API calls timeout when syncing large test datasets between environments Windchill question , api-development , rest-api , data-sync , timeout-error , test-data-mg , wc-11-1-m030 , rest-api-v2 , http-504	6	0	March 15, 2025
Automated ECS to OSS backup reduces ERP downtime during system maintenance windows Alibaba Cloud use-case , compute , ac-2019 , downtime-reduction , ecs , oss , backup-automation , cli-scripting , manual-backup	3	0	March 24, 2025

OSS multipart upload fails for large files with timeout errors and incomplete uploads

Related topics