AWS S3 Multi-Part Uploads

Once you’ve created an S3 bucket, you’ll likely need to transfer large files — sometimes in the order of gigabytes or even terabytes.
AWS S3 allows object sizes of up to 5 TB, but a single PUT upload is limited to 5 GB.
To handle larger uploads, S3 offers a Multi-Part Upload mechanism.
The idea is simple:
- Split the large file into smaller parts
- Upload each part individually (possibly in parallel)
- S3 reassembles these parts into the final object
This method provides two key advantages:
- Fault tolerance: If an upload fails midway, you can resume from the failed part.
- Parallelism: Parts can be uploaded concurrently for faster throughput.
Rules to Remember
Before jumping into implementation, note the following S3 Multi-Part Upload constraints:
- A file can be split into a maximum of 10,000 parts.
- Each part must be between 5 MB and 100 MB in size.
- You can use S3 lifecycle policies to automatically abort unfinished multi-part uploads that exceed a time limit.
Steps for a Multi-Part Upload
The process involves three main stages:
Initiate the upload Use
CreateMultipartUploadAPI — this returns anUploadIdused to reference the ongoing upload.Upload each part Use the
UploadPartAPI — each call returns anETag(a checksum for that part).Complete the upload Call
CompleteMultipartUploadwith theUploadId, part numbers, and their ETags. S3 then assembles the file from these parts.
You can even overwrite parts while the upload is in progress — enabling in-flight file modifications.
Hands-On Implementation
We’ll walk through creating a Python script using boto3 to perform a multi-part upload.
Step 1 — Get Upload ID
import boto3
def start_upload(bucket, key):
"""Returns the UploadId for multi-part upload"""
client = boto3.client("s3")
response = client.create_multipart_upload(Bucket=bucket, Key=key)
return response["UploadId"]
This initializes the upload and returns the UploadId.
Step 2 — Upload One Part
def upload_part(bucket, key, part_num, upload_id, data):
"""Upload a part to S3"""
client = boto3.client("s3")
response = client.upload_part(
Bucket=bucket,
Key=key,
PartNumber=part_num,
UploadId=upload_id,
Body=data
)
print(f"Uploaded part {part_num} with ETag {response['ETag']}")
return {'PartNumber': part_num, 'ETag': response['ETag']}
Each part requires the same UploadId and its sequence number.
The response contains an ETag that must be passed during final assembly.
Step 3 — Putting It Together
You can parallelize uploads using the concurrent.futures module:
from concurrent.futures import ProcessPoolExecutor, as_completed
import boto3
# Start upload
upload_id = start_upload(bucket, key)
# Upload parts in parallel
futures = []
with ProcessPoolExecutor(max_workers=10) as executor:
with open(file_name, "rb") as f:
i = 1
chunk = f.read(chunk_size_bytes)
while len(chunk) > 0:
future = executor.submit(
upload_part,
bucket=bucket,
key=key,
part_num=i,
upload_id=upload_id,
data=chunk
)
futures.append(future)
i += 1
chunk = f.read(chunk_size_bytes)
# Collect results
results = [f.result() for f in as_completed(futures)]
# Complete upload
boto3.client("s3").complete_multipart_upload(
Bucket=bucket,
Key=key,
UploadId=upload_id,
MultipartUpload={'Parts': sorted(results, key=lambda e: e["PartNumber"])}
)
Step 4 — Testing the Program
# Create a test bucket
aws s3 mb s3://test-3224-random --region us-east-1
# Run the program
python3 upload.py --file app.msi --bucket test-3224-random --key app.msi --chunk_size 6
That’s it — the script uploads your file in multiple parallel parts, then assembles them in S3.
Full Solution Code
Here’s the complete working script:
import boto3
import argparse
import json
from concurrent.futures import ProcessPoolExecutor, as_completed
def start_upload(bucket, key):
"""Returns the UploadId for multi-part upload"""
client = boto3.client("s3")
response = client.create_multipart_upload(Bucket=bucket, Key=key)
return response["UploadId"]
def upload_part(bucket, key, part_num, upload_id, data):
"""Upload a part to S3"""
client = boto3.client("s3")
response = client.upload_part(
Bucket=bucket,
Key=key,
PartNumber=part_num,
UploadId=upload_id,
Body=data
)
print(f"Uploaded part {part_num} and received ETag {response['ETag']}")
return {'PartNumber': part_num, 'ETag': response['ETag']}
if __name__ == '__main__':
MB = 1024 * 1024
parser = argparse.ArgumentParser()
parser.add_argument("--file", required=True)
parser.add_argument("--bucket", required=True)
parser.add_argument("--key", required=True)
parser.add_argument("--chunk_size", required=True, help="Size of each part in MB.")
args = parser.parse_args()
file_name = args.file
bucket = args.bucket
key = args.key
chunk_size_bytes = int(args.chunk_size) * MB
upload_id = start_upload(bucket, key)
futures = []
with ProcessPoolExecutor(max_workers=10) as executor:
with open(file_name, "rb") as f:
i = 1
chunk = f.read(chunk_size_bytes)
while len(chunk) > 0:
future = executor.submit(upload_part, bucket, key, i, upload_id, chunk)
futures.append(future)
i += 1
chunk = f.read(chunk_size_bytes)
results = [f.result() for f in as_completed(futures)]
response = boto3.client("s3").complete_multipart_upload(
Bucket=bucket,
Key=key,
UploadId=upload_id,
MultipartUpload={'Parts': sorted(results, key=lambda e: e["PartNumber"])}
)
print(json.dumps(response))
Summary
- Multi-part upload splits large files for parallel and resumable uploads.
- Each part is individually uploaded and later combined by S3.
- This method enhances reliability and performance for massive datasets.





