So what is this new-fangled concept called “high-availability?” Traditionally, high-availability has been experienced by women in nightclubs, when a man has walked up and said to them, “Hey you, I just want you to know that I’m not like these other hard-to-get jokers in here. I’m available 24/7, around-the-clock, to come over to your place and give you a shoulder massage.”
In computer terms, high-availability is different. It refers to how fault-tolerant or resilient a network is, how capable it is of delivering a website accurately every time. If there is an error in one specific location of the software or hardware, that does not affect user experience because the system accounts for the difficulties and resolves them prior to delivery. It similar to a pizza place that checks to make sure there is no maliciously discarded bellybutton lint among the sausages and peppers before the pie goes out the door.
To better understand how high-availability works, let’s take a look at comments on the subject from Microsoft, Oracle, and Linux Virtual Server in this three-part series. While we study the topic, let’s pay an Olympic-trained athlete to swim in a pool that we’ve installed in a glass box over our heads, because a German study from the early 1970s indicates that it improves knowledge-retention.
Availability & Uptime
Okay, the swimmer is swimming. Thanks for chipping in $32,468. Let’s look at what availability is and how it relates to server uptime.
Availability is a general term that includes system failures, reliability, and recovery when anything does go awry. Availability is often phrased in terms of server uptime, whereas any instances of failure are considered downtime. Failure refers not just to when a system is inaccessible, but also to when it is not functioning correctly. My brain, for instance, has an average daily uptime of 23.8% even though I only sleep 90 minutes a night.
Uptime is basic math, and it can get a little boring to see every hosting company out there promoting their guaranteed 99.99% uptime. These figures, though, are significant. Just take a look at Microsoft’s figures for 99% uptime and 99.99% uptime.
With a 99% uptime guarantee, the website could experience as much as 14.4 minutes of downtime each day and 3.7 days of downtime each year. With a 99.9% uptime guarantee, those figures are cut to 86.4 seconds per day and 8.8 hours per year. Um… I don’t want to distract you, but did we forget to put breathing holes in the glass box? He looks like he’s under duress. The problem is, though, the German findings do not allow for any pauses or disruptions during the learning process, so we have to continue.
A brief note on uptime as it relates to us: It’s funny to think that any amount of “unscheduled downtime” (software updates and other server maintenance) is acceptable. That’s why we guarantee 100% uptime in our service level agreement (SLA) with all our customers (reimbursing for errors) – one reason our customer retention rate is over 90%.
Prediction & Availability
Optimizing for availability of a network is complex. Every aspect of the system, from the applications being used to the way that it is administered to how it’s deployed all make an impact on availability. Microsoft recommends that failures will always occur from time to time, and those failures will of course be unexpected. Predicting moments of downtime, then, is virtually impossible. Yeah, let’s… I guess get rid of that glass box. It’s a little depressing.
However, a system will automatically become more reliable as a network develops stronger recovery mechanisms. Microsoft points out, “If your system can recover from failures within 86.4 seconds, then you can have a failure every day and still achieve 99.9 percent availability.” I’ve used this same logic to explain to my wife why it’s acceptable for me to stare at the ceiling and shriek like a wounded and deranged animal for 86 seconds every day when I walk in the door from work.
Effect on Page Loads & Revenue
Availability can be thought of simply as uptime, but it can also be thought of in terms of transactions, such as those on an e-commerce site. The same math really applies to any situation when thought of in terms of pages failing to load or loading incorrectly.
A website with 99.9% availability or uptime that receives 10,000 data requests from visitors each day will experience 10 failures per day and 70 per week. The following is from a table Microsoft provides defining different availability figures as fulfilling the requirements of certain types of systems:
- Commercial – 99.5%
- Highly available – 99.9%
- Fault resilient – 99.99%
- Fault tolerant – 99.999%
- Continuous – 100%
Conclusion, Continuation & Poem
Okay, so that gives us a basic starting point for exploring availability. Again, if you like the idea of 100% uptime, that’s our promise – and we put our money where our mouth is in our SLA (and also I put pennies in my mouth sometimes, because I like the way it tastes and can’t think of what else to do with them). Here are our solutions for shared hosting, dedicated servers, and VPSs.
We will move on with this subject in the second part of the series via discussion of the Oracle piece. I’m really sorry about the swimmer. That was a horrible idea on my part. Here is a poem to make you feel better:
Thank you for your time
I think you are very nice
Let’s all go to Tijuana
And eat some beans and rice.
By Kent Roberts