How to Backup WordPress on a Remote Server (and send it to Amazon S3)

30 March 2014
nginx,
wordpress,
sysadmin

Let's talk about backups.

When you are running your own WordPress hosting service, you need to ensure that your sites are backed up nightly. Unfortunately, the more sites you have on your server, the more processing power is required to do these backups, especially if there have been a lot of changes during the day.

The smartest way to proceed with nightly backups, then, is to offload the processing requirements to a separate, dedicated backup server, or just a secondary server that can afford to use a lot of its CPU in zipping, rsyncing, and uploading to the Amazon cloud.

In this tutorial, we're going to build a bash script that will:

Connect to a remote server
Create a list of all your WP clients
Loop through each client and sync changes to a local directory, both files and DB
Zip the entire site
Upload that site to Amazon S3

Ready to get started? Good!

The Setup #

Throughout this tutorial I'll be referring to servers Mercury and Venus, which will be what we call our two servers in this dedicated backup box scenario.

Venus will be the box that faces the internet. This should have a filesystem that contains all your WordPress installs. For example, you might have something like:

/var/www/client_a.yoursite.com/htdocs/wordpress here

and

/var/www/lawsonry.yoursite.com/htdocs/wordpress here

You can obviously have your files wherever you want, just be sure to change the folders in the code contained in this tutorial before you plug it in to your server.

Mercury will be the server that does all the work. The steps involved in Mercury becoming a dedicated backup box to backup WordPress to S3 are outlined at the top of this article, and are exactly what we're going to go through now.

Please note that I am using s3cmd to communicate with Amazon SWS (S3). You can follow this simple setup tutorial here if you aren't using it yet (you should be!).

So remember: Venus has the web folders that are accessed when your sites are hit, and Mercury is the second server that will do all the work.

Connect to remote server and get a list of clients #

This script will be a bash script existing on Mercury. The first part involves Mercury talking to Venus and getting a list of clients. I accomplish this by listing all the folder names in a specific web directory (the /var/www directory, in my case) and then extracting the name of the client from there.

For example, I have clients named lawsonry, thebobs, and thor. Their web folders will be in the following locations (with the following naming scheme):

/var/www/lawsonry.mysite.com /var/www/thebobs.mysite.com /var/www/thor.mysite.com

Inside each of these, I have two folders: htdocs, where the web files are stored, and log, where I host log files. It's also important to note that I keep the wp-config.php file in the main folder (i.e., lawsonry.mysite.com, and not in the same folder as the web files (htdocs)).

You can setup your naming scheme however you want, but this is how I do mine so that's what you'll see addressed below.

<h1>!/bin/bash</h1>

<h1>This automated backup script is designed for use with the DashingWP file system.</h1>

<h1>Author: Jesse Lawson</h1>

<h1>Note: This backup system only backs up Venus server clients.</h1>

echo -e "Beginning backup of Venus server..."

Get a list of all your clients #

The first thing we'll do is contact the remote server and get a list of all the directories in /var/www. We do this by calling ls/var/www, and then looping through the results and checking that the dir name contains ".yoursite.com" on the end — that means it's a hosted site and requires backups. (This naming convention also allows us to put other folders in the /var/www directory and have them not processed for backups, in case we want that).

<h1>Create an array to store the client folders</h1>

list_of_clients=()

<h1>Get the list of directories from Venus</h1>

for item in <code>MARKDOWN_HASH5f5eba1d3f8be433b350db9a78cbe123MARKDOWN_HASH</code>
do
  # Check if item is a client site
  if [[ $item = *.yoursite.com ]]; then
    echo -e "Found client \"$item\""
    list_of_clients+=("$item")
  fi
done

Note that I wrote venus_ip up there. You'll have to replace that with the IP address of your equivalent of the Venus server, preferably using a private network IP and not a public one (for security reasons). Example: 123.45.678.90

In the code above, you can see that we list the contents of /var/www and then loop through them, for each one checking to see if its syntax matches the *.yoursite.com scheme. If it does, we'll pull it out and push it into our array.

3. Loop through clients and start prepping variables #

Next, we'll loop through the results and pull out the relevant data (i.e., the site name), and also preconfigure some variables we'll be using later.

<h1>We are calling a separate loop because I want to clearly differentiate between ssh calls</h1>

echo -e "Found $counter clients."
echo -e "Filtering out clients to new array..."

for client in ${list_of_clients[@]};
do
  # Set internal name for backups. "Cut string delimited by periods (-d .) and only retrive the first part (-f 1)
  internal_name=$(echo $client | cut -d . -f 1)

# Create a var that holds the current day
  NOW=$(date +"%d")

# Create a filename that has the name of the site + the day of the month in it
  FILE="$internal_name.daily_$NOW.zip"

# Create a placeholder for the directory where we'll store the zip file
  BACKUP_DIR="/var/www-backups/$internal_name"

# Create a placeholder for the directory where the site actually is on Venus
  WWW_DIR="/var/www/$internal_name.yoursite.com/"

As you can see, we looped through the results of our clients (list_of_clients array) and for each one, we pulled out the name of the site (and call it internal_name), and setup some variables. The date variables are probably the most confusing here, but it only takes a second to explain.

Amazon AWS will house as much data as you want, and frankly, that's both a good thing and a bad thing. When I first designed this, I just put a current date timestamp at the end of the file (something like lawsonry_backup_20140115.zip for a January 15 backup of Lawsonry) and then uploaded it to the S3 servers. What ended up happening was that I would end up with as many backups as days my site has been online; at one point I realized that I would soon have hundreds and hundreds of backups!

Instead of writing a system to go back and delete old backups, I figured it would be easier to just create backups based on the day of the month. So if today is the 15th of the month, tomorrow's backup (for the 16th) will be the most current snapshot of your site. The next backup after that (the 17th) will actually be the backup for the 17th of last month. Get it?

So during every backup cycle, we simply enumerate the backup file according to the current day of the month. Eventually we'll start overriding files, but those files will already be 30 days old by the time they're overridden.

As far as your clients are concerned, this method ensures that you're keeping 30 days worth of daily backups.

At the end of the code block above, you can see that I am creating a temporary folder www-backups where I store a copy of the zip file. You can keep a local copy of the zip file on the folder here (in case Venus goes down and you lose access to S3 for some reason), or you can get rid of it to save space (more on this later on in the tutorial).

Rsync from your web sever to your backup server #

Let's keep moving along. Next, we'll use rsync to marry up the contents of the web directories on Venus and Mercury.

echo -e "Backing up $client"
echo -e "--> Pulling directory from Venus..."

rsync -avzhe ssh --exclude='<em>wp-snapshots</em>' root@venus_ip:/var/www/$client /var/www

You'll notice two things here right off the bat:

Mercury has a fresh copy of Venus's web folder, which is where the backups are generated from (to take the load off of Venus, which needs to worry about serving files to the web). If you wanted you, you can also turn Mercury into a failover server that serves the content of its /var/www folders in case Venus goes offline. Two birds, one stone.
I am purposefully excluding those giant wp-snapshots from plugins like Duplicator, just in case people are using them. In the future, we could probably exclude all backup dir taxonomies to ensure that people who use backup plugins even though we tell them not to are not having those backups included in our rsync transfers and zipped backups (they'll take up way too much room). basically, with that exclude flag in there, I am excluding any file and/or folder that contains the word wp-snapshots in it whatsoever.

p>Now that we've rsync'd the hosting site over to Mercury, a "fresh" copy of the site exists. This means that the bulk of CPU power on Venus during backups is taken up during the initial rsync (and any future rsyncs where there are lots of information to exchange). Assuming everyone installs and adds all the files they're going to add on day one, Venus should only experience high CPU load during rsync while transferring the data to Mercury for the first time. After that, rsync will only transfer over the changed files, making future rsyncs for each client sites very fast, and taking the CPU load off of Venus very quickly.

Do a remote mysqldump #

After this rsync, the only other load Venus will incur is a mysqldump. Let's do that now.

echo -e "Remotely dumping $client database..."

<h1>Navigate to folder and backup database to a file in wp-content</h1>

cd /var/www/$client

<h1>Backup database to wp-content folder</h1>

<h1>Extract db variables from config file</h1>

echo -e "Extracting database credentials..."

DB_NAME=<code>MARKDOWN_HASH8ae4553f04077396482911d59432003fMARKDOWN_HASH</code>
DB_USER=<code>MARKDOWN_HASH0ae7444b590748f419a14422287731c7MARKDOWN_HASH</code>
DB_PASS=<code>MARKDOWN_HASH5182bafcd249767316487c8c80ec8838MARKDOWN_HASH</code>
DB_FILE="mysql.sql"

echo -e "Creating temporary backup directory..."

<h1>mkdir if it doesn't exist. It's only temporary</h1>

mkdir -p $BACKUP_DIR

echo -e "Commencing remote mysqldump... "

<h1>Dump database to mysql.sql in wp-content folder</h1>

ssh root@venus_ip "mysqldump -u$DB_USER -p$DB_PASS $DB_NAME" &gt; /var/www/$client/htdocs/wp-content/mysql.sql

The comments and echoed updates should help to explain this. Basically, we're using grep to look in the wp-config.php file in order to extract the database credentials. From there, we ssh into Venus and remote mysqldump the database into a local file on Mercury.

Zip our latest snapshot and push it to S3 #

So we have the files, we have the database, now let's just move the wp-config file to a location that's easily accessible for when we have to restore from this backup. Note that this is only necessary because I put my wp-config.php files outside of the htdocs folder on purpose. You can skip this if you keep your wp-config.php file in your WP root folder.

echo -e "Copying wp-config to wp-content directory… "

Copy over the wp-config.php file so that we can just zip the htdocs folder #

cp wp-config.php htdocs/wp-config.php [/code]

Now that our local (Mercury) files are synced up to be the most current snapshot of the client sites on Venus, we can zip the contents without putting a load on Venus. The final steps involved are zipping our local clients and then pushing them up to S3.

<h1>Go into htdocs so the zip goes straight to the directory</h1>

cd /var/www/$client/htdocs

echo -e "Updating $client's snapshot (daily.zip)..."

<h1>The --filesync arg synchronizes our rsync'd directory with a 'daily.zip' file that exists at</h1>

<h1>/var/www-backups/internal_name/daily.zip. filesync is explained on this page: http://www.info-zip.org/mans/zip.html</h1>

<h1>Basically, it checks for changes in our local (Mercury) copy of the site, and then only modifies the daily.zip to account for changes in that local copy (Mercury).</h1>

zip -9 -r --filesync --quiet $BACKUP_DIR/daily.zip .

<h1>Now that we've updated our daily snapshot, we need to copy it to a daily_# file so we can upload it.</h1>

echo -e "Duplicating snapshot to $FILE..."

cp $BACKUP_DIR/daily.zip $BACKUP_DIR/$FILE

<h1>Push to S3</h1>

echo -e "Pushing $FILE to Amazon S3..."
s3cmd put $BACKUP_DIR/$FILE s3://yoursite/snapshots/$internal_name/$FILE

<h1>Remove backup dir and wp-content from htdocs</h1>

echo -e "Removing $FILE..."

rm -rfv $BACKUP_DIR/$FILE

Still with me? Good!

The following data stays on Mercury:

`/var/www/$internal_name – This is a nightly snapshot of the sites on Venus
`/var/www-backups/$internal_name/daily.zip – This is a nightly zip archive of the above snapshot. Note that we're keeping our local (Mercury) copy of the rsync'd data because it lessens the load on Venus during backups, and allows us to use zip –filesync to greatly speed up zipping operations.

echo -e "$internal_name backup is complete."

<h1>Write to backup log file</h1>

<h1>TIMESTAMP=$(date +'%Y-%m-%d %H:%M:%S|%N')</h1>

<h1>echo "Daily #$NOW backup for $internal_name completed at $TIMESTAMP\n" >> /var/www-backups/log/daily-backups.log</h1>

done

echo -e "Finished backup of Venus server."

Plug it into cron #

Edit you crontab with crontab -e and set this file to execute every night.

And that's it! #

Congratulations! You just setup a system to remote backup your web server and push the backups to S3, all while keeping a daily zip of the most current snapshot and a copy of the web directory on your backup server just in case your web server fails (and you have to turn your backup server into a web server). High availability, anyone?

Leave your questions and comments below and I'll do what I can to help get you all setup. Cheers!