Creating an S3 mirror to battle-proof the charity Reprieve's Django-based site, just in time for a very public endorsement by Stephen Fry to his 1.25 million Twitter followers.

Reprieve found themselves in a situation where they needed to scale up their site in a hurry. Stephen Fry had agreed to tweet a link to an article about one of their client cases to his 1.25 millions followers, and they'd been warned that the resulting traffic spike was likely to take down their site on a modest shared hosting plan.

We hadn't built the code for the site in question. Infact, we hadn't even seen it when we agreed to help - all we knew was that it was built in Django. On top of this, we had just a couple of evenings to come up with something. With an unfamiliar app we didn't feel there was time to move the site to a new server and confidently test it. We were pretty much stuck with what we had, so we needed a different kind of fix.

Once we got a look at the code we found nothing disastrous about it. A bespoke CMS built with Django, but it became apparent that whilst the site was CMS driven, once the articles and pages had been created, they remained static, and there was very little else which was dynamic by necessity (just the search really).

The idea we came up with was to create a temporary static HTML mirror of the generated pages, serve this from Amazon's S3 cloud, and link to this mirror from the tweet. For the couple of places in the code for which static content wouldn't suffice we would link back to the dynamic site as needed. We knew that the vast majority of users would never leave the S3 version, and for the tiny fraction who did, the original site should stand up fine, but even if in the worst case this led to problems for some people, the S3 mirror and consequently Reprieve's important message would still function perfectly for other users.

These are the steps we took to do this:

  1. We created a Reprieve S3 bucket, and set up a static DNS name to point to this. As the link would be coming from a tweet anyway, it would be shortened to a bit.ly link, so strictly speaking this may not even have been necessary, but we still felt as though it was worthwhile. To point a DNS name at an S3 bucket, create the bucket with the same name as the DNS name you wish to point at it, eg:

    Create a bucket called static.yoursite.com

    Create a CNAME called static.yoursite.com to point at this S3 bucket, in my tinydns config it looks like this:

    Cstatic.yoursite.com:static.yoursite.com.s3.amazonaws.com

  2. Once we were happy with the site content, knowing it would remain static until after the tweet, we took a rip of the site using the httrack utility. Install httrack on Ubuntu/Debian using apt, or grab a binary for Windows/OSX/your distro.

    sudo aptitude install httrack

  3. Next, we ran a linkchecker scan on the new, static, site, to see if there were any broken links.

    sudo aptitude install linkchecker

    We anticipated lots of searching and replacing at this stage, but to our surprise there were merely a handful of places which needed ammends (a couple of hardcoded full paths). I was surprised at how well httrack worked.

  4. Finally, we uploaded this static version of the site to S3, and ran the linkchecker against the new static URL to check that all was working as expected, and we were ready. We had a temporary static mirror of the entire Reprieve site up on S3, ready to withstand any sort of hammering.

The tweet landed, nearly 19,000 people hit the site. As expected, the S3 version of the site didn't so much as flinch, and the dynamic version of the site also stood tall.

Obviously this sort of solution is no replacement for building a ready to scale site from the start, with respect to both the code and the infrastructure it's hosted on, but for a quick fix this was a great solution.

Update: It seems like this sort of serve a static rip approach could be nicely handled by the static generator project.

comments powered by Disqus

Pingbacks

Pingbacks are open.