When uploading a very large file to AWS S3 (> 100GB), you may wanna split the file and then upload its parts using the Multipart file Upload tool provided by AWS.
That way, if you lose connection for a reason, you'll be able to resume the upload with no problems. Also, using the prefix --content-md5
, you can check the content of the uploaded file and compare it with your local file.
- Create a multipart upload using the AWS S3 API
- Example:
aws s3api create-multipart-upload --bucket my-bucket --key 'multipart-1'
- Please, take notes of the
upload_id
andkey
values; you'll need them.
- Example:
- Clone this repo
- Set permissions:
chmod +x multipart-file-upload-s3.sh
- Edit
multipart-file-upload-s3.sh
with your requirements - See variables below for more information.- Change
bucket
,profile
,upload_id
andkey
.
- Change
- Create the
logs
directory:cd awsS3-multipart-upload-script && mkdir logs
- Run:
./multipart-file-upload-s3.sh
- Check AWS documentation for next step. You'll have to run the
complete-multipart-upload
command.
The script will start reading your /home/lucas/aws-upload-test/files/x
directory for files, will take the MD5 checksum of them and parse it to the S3 API as the --content-md5
parameter, and then it will start uploading each file to the specified bucket
.
The outputs will be sent to a log file.
Make sure to save that log file, you'll need the ETag
output later on.
An example of the output of the script:
{
"ETag": ""e868e0f4719e394144ef36531ee6824c""
}
The script will send the output to another file and format it to be compatible with the AWS requirements for the complete-multipart-upload
command.
AWS complete-multipart-upload
output example:
{
"Parts": [
{
"ETag": "e868e0f4719e394144ef36531ee6824c",
"PartNumber": 1
},
{
"ETag": "6bb2b12753d66fe86da4998aa33fffb0",
"PartNumber": 2
},
{
"ETag": "d0a0112e841abec9c9ec83406f0159c8",
"PartNumber": 3
}
]
}
More information about the split
command for Linux here.
bucket
= Your S3 bucket name.
profile
= Your AWS profile (i.e. aws configure --profile tests3
).
upload_id
= Your upload_id
, retrievable when executing create-multipart-upload
.
/home/lucas/aws-upload-test/files/x/
= The directory in your HD that contains the splitted files.
key
= Object key for which the multipart upload has been initiated.