The Cloud doesn’t float on its own.
This past weekend, AWS suffered a huge outage on its east coast data center located in Virginia, which resulted in a multitude of sites, such as Instagram and Pinterest, going down or becoming unresponsive. The main cause of this outage is simply known as a single point of failure which means that because one data center lost power, the entire site went down. With all of these major sites going down, I think this is one of the best times to explain to clients, at a very abstract level, about load balancing their servers.
To put it in easy to understand terms, think of the old time Christmas lights; as long as all of light bulbs are intact and working, the entire strand works perfectly. If you take one light bulb, or if one were to break, the rest of the strand goes dark and you’re stuck with a pretty sad looking tree. This is known as a “series” wiring scheme and is very susceptible to a single point of failure. Most servers hosted outside of a cloud setup can be victim of this kind of failure.
Some of you may say to yourselves, “but I have strands that don’t go out with a single light break!” This is true, and those strands are using a “parallel” wiring scheme in which all of the lights are connected, but there is a secondary wire that connects the strand together in case of a failure. Many cloud hosting providers offer this secondary “wire” for your servers on the cloud, although most of the time this wire is never created.
Amazon does offer something similar to this thinking, called Elastic Load Balancing, which will automatically route your users to the best instance available. It seems to me that while it should be a no-brainer to get a segmented sever solution, it seems many clients choose not to spend the extra money. Perhaps explaining to our clients in simple terms exactly what happened and how to prevent it, we can all avoid any headaches that these outages may cause.