Deleting many files from an S3 bucket

So we found ourselves in the need to delete a considerable amount of files (around 500000, amounting to 1.6T) from an S3 bucket. With the list of files in hand my first shot was calling

aws s3 rm s3://BUCKET/FILE

for each file. That wasn’t the best idea I have to say, since first of all, it makes 500000 requests, and then it takes a looong time. And this command does not allow to pass in multiple files.

Fortunately there is aws s3api delete-objects which takes a json input and can delete multiple files:

aws 3api delete-objects --bucket BUCKET --delete '{"Objects": [ { "Key" : "FILE1" }, { "Key" : "FILE2"} ... ]}}'

That did help, and with a bit of magic from bash (mapfile which can read in lines from stdin in batches) and jq, at the end it was a business of some 20min or so:

cat files-to-be-deleted |  while mapfile -t -n 500 ary && ((${#ary[@]})); do
        objdef=$(printf '%s\n' "${ary[@]}" | jq -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
        aws s3api --no-cli-pager  delete-objects --bucket BUCKET --delete "$objdef"

This reads 500 files a time, and reformats it using jq into the proper json format: reduce inputs is a jq filter that iterates over the input lines and does a map/reduce step. In this case we use an empty array as start and add new key/filename pairs on the go. Finally, the whole bunch is send to AWS with the above API call.

Puuuh, 500000 files and 1.6T less, in 20min.

1 Response

  1. 2020/10/29

    […] Deleting many files from an S3 bucket | There and back again […]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">