Traffic to Oxfam’s website surged by 400% within 12 hours of the Haiti earthquake. For the Pakistan floods, the peak was lower but it lasted longer. It’s great that the public wants to support the victims of disaster, but it also creates problems for IT. How do you build an infrastructure to cope with such surges?
Buying lots of servers and keeping three-quarters of them idle isn’t an option for most of us. It certainly isn't for Oxfam – it wants to put its resources into the field, not into datacentres. So historically it has provisioned its systems to handle typical demand then degraded the website’s functionality during surges. For Haiti, the site was reduced to a single page for accepting donations. This worked, in that Oxfam could keep taking donations. But it couldn’t begin to build an ongoing relationship with new supporters.
Could the “cloud” help? After 12 months of exploring the options and migrating part of the site into the cloud, here’s what we’ve learned:
• Cloud is about economics. Most of the technical underpinnings for the cloud have been around for years. There are still some rough edges, but the technology broadly works. The biggest challenge we faced was understanding the economics of “elastic” capacity – how does the overall cost of service change under different workload scenarios? This is also the biggest opportunity: with the right model, Oxfam can both reduce the cost of normal operations and readily ramp up capacity in response to surges. It took a lot of scenario planning for us to build the right economic model for Oxfam’s operations.
• Standards are still emerging. The pundits compare cloud to the electricity grid. That’s true, but only if you’re looking at the 19th century electricity grid. Some vendors run AC and some run DC. They all run different voltages. The price differs depending on whether you’re running kettles or light bulbs. Comparing like-for-like in order to build costing scenarios is hard work.
• The market is immature. Without standards, there’s no clear basis for comparing vendors. Oxfam specified its capacity requirements in technology-neutral terms, yet every vendor had to turn that specification into a detailed technical design before they could provide a price. And to compare their pricing, we had to work from the technical specifications – no-one would directly assure specific levels of capacity without also specifying the technology. When was the last time your electricity supplier discussed turbine design with you? That’s where the cloud market is.
• Manageability matters as much as capacity. The cost of service depends as much on the number of virtual machines as it does on the overall amount of computational capacity. That’s fair enough – each VM needs to be managed. But it means that you can’t cost the infrastructure without having a pretty good idea of the shape of your application portfolio, and of how it might change over time.
Application licensing constrains deployment. Many web applications are designed to scale horizontally – they’re best deployed as a swarm of small VMs. This works beautifully in the cloud. Yet their vendors still license them on a per-server basis, killing the economics. For now, we’re deploying a small number of larger VMs and hoping the software vendors will eventually engage with the cloud.
• Don’t get hung up on a name. Given the importance of handling demand surges, we expected a public cloud model to make most sense for Oxfam. All the solutions we looked at were based on a hybrid model, some weighted towards public cloud and some towards private cloud. We agonised over whether this mattered. In the end, it came back to economics: once a solution met the basic requirements for capacity, availability, security, etc, then what would it cost to operate? The label it carries doesn't really matter.
• Cloud opens up other issues. Cloud makes costs much more transparent. You can see what you’re paying for storage, for network transfers, for computational capacity for each application, etc. This brings issues like archiving and retention policies into more focus. This is all positive – it will help Oxfam manage its systems more effectively – but it means that moving to the cloud is only the start of the journey.
So far, the cloud is working for Oxfam. The first sites are live and more are moving there. We don't want to test it fully – we don't want another Haiti. But it’s going to happen, somewhere in the world, sometime. Oxfam was able to respond quickly and effectively last time. Next time the cloud may enable it to do even better.