Content Gardening

24 Nov 2016

After experimenting with S3cmd, I immediately started automating my custom backup process. Quick to get working and improve as you go, thanks to Python!

Calling S3cmd from Python

The trick is to use Python's subprocess module to invoke the command as you would do in the Unix/Linux shell, via the call function.

To get a quick idea of how this works, you can play interactively with the Python interpreter, using the ls command for example:

$ python
Python 2.7.11 (default, Dec  5 2015, 14:44:53) 
...
>>> import subprocess
>>> CMD = "ls -la"
>>> subprocess.call(CMD, shell=True)
...

First, make it work with a simple case

In the first version of the script, I am targeting the screenshots I create every now and then, as shown in my previous article.

Here is the resulting code:

#!/usr/bin/env python

import sys
import subprocess

BUCKET = "s3://contentgardening.com-backups"

def main(argv):
    """ Main function """

    SRC_DIR = "/MYSCREENSHOTS"
    DEST = BUCKET + "/images/screenshots/"

    CMD = "s3cmd put --force %s/*.* %s" % (SRC_DIR, DEST)
    subprocess.call(CMD, shell=True)

if __name__ == "__main__":
    main(sys.argv[1:])

Note that the s3cmd executable is referenced just by its name here, for better readability, but you want to provide its full path. Even better, put the path in a variable as done for the destination S3 bucket.

To run it, we simply do:

$ python backup-to-s3--v1.py

The improved and extended version

I later extended the script to take care of different cases, such as backing up software archives which are located under a specific path on my computer.

The first change needed is using a "for loop" to go through all cases, and for each one, call the underlying tool with the right arguments. I used a mapping to provide the parameters for each case, mainly the source directory and the "virtual path" to use in the destination S3 bucket.

I ended up with the following code:

#!/usr/bin/env python

import sys
import subprocess

BUCKET = "s3://contentgardening.com-backups"

FILES_MAPPING = {
    'software': ["/MYSOFTWAREArchivesAndImages", "/software/"],
    'screenshots': ["/MYSCREENSHOTS", "/images/screenshots/"],
    # add other cases here...
}


def main(argv):
    """ """

    CMD = ""
    LSCMD = ""
    backup_types = FILES_MAPPING.keys()

    for backup in backup_types:
        SRC_DIR = FILES_MAPPING[backup][0]
        DEST = BUCKET + FILES_MAPPING[backup][1]

        CMD = "s3cmd put --force %s/*.* %s" % (SRC_DIR, DEST)

        subprocess.call(CMD, shell=True)

        # Print feedback to know the current files that are backed up
        LSCMD = "s3cmd ls %s" % DEST
        out = subprocess.check_output(LSCMD, shell=True)
        print(out)

if __name__ == "__main__":
    main(sys.argv[1:])

As you can see, there is a 2nd part where I am calling s3cmd ls to get the current list of file objects in S3, which gives me a kind of feedback after the files are uploaded.

This is already a good start, and I could make it run regularly via crontab.

Other things to add

There are several ideas of features that could be added in the future, such as logging the feedback information or some stats, and/or building a summary report and even mailing that to an inbox or pushing it to an app for future analysis.

As I continue improving this, I will share more things that Python scripting allows us to do.

Wrapping S3cmd In A Script For Files Backup To Amazon S3

Calling S3cmd from Python

First, make it work with a simple case

The improved and extended version

Other things to add

Need help for your project?