High-availability, as I have discussed in the previous installments of this series, is a concept that has changed and grown over time. In the past, high-availability was the condition exhibited by a man in a dive bar in Duluth, Minnesota, systematically handing out his landscaping business card to all the female patrons with the words, “I have a lot to offer, and I hope you’ll give me a chance with your shrubbery.”
In the age of information technology, however, high-availability has become more reputable. In fact, high-availability is desired by all those conducting business online. It’s the nature of a system with very little downtime.
To review, optimizing an infrastructure for uptime is often wrongly considered to be, simply, an effort at preventing failures from occurring. Per Microsoft, it’s difficult and sometimes impossible to predict when failures will occur. High-availability involves a thorough focus on recovery, decreasing the length of any downtime instances. For this same reason, I run training drills so that when someone knocks my books out of my hands, I can pick them up before many of the other doctoral students notice.
To look at high-availability from a number of different perspectives, we’re looking at articles from Microsoft, Oracle, and Linux Virtual Server. Today, we are continuing to explore the Oracle piece, also briefly noting commentary from the Linux Virtual Server site.
While we review the idea of high-availability, let’s grab the keys to my father’s Cadillac, drive it out into the mountains, and make clucking and whirring noises to attract the Abominable Snowman. Then let’s offer him a fully-loaded bacon double-cheeseburger and tell him he’s the only one who understands us.
Availability: High-Availability Problem Solving, Continued
In the last post, we looked at comments by Oracle on various technologies that can be used to optimize availability. Let’s continue to look at additional safeguards that can be implemented so that a system is less likely to experience downtime. For the same reason, safety, we will wear full body armor on our trip and carry a sack of water balloons to throw at our beloved monster if he becomes enraged.
As a general rule of thumb, redundancy is the core component of recovery. When there are multiple instances operating simultaneously (active-active availability technology) and when additional systemic components are on standby to be activated as needed (active-passive availability technology), failure can, in a sense, become irrelevant. The system remains consistent throughout, just like the snoring soundtrack that will be playing on our boomboxes at home while we are on our critical mission.
Additional Local High-Availability Solutions
Let’s look at a few additional problem-solving tools for use on a local system, courtesy of Oracle.
Routing and state replication
Stateful applications should have the ability to include additional instances of client states. This capacity allows the applications to continue to run smoothly if processes fail that are handling client requests – similarly to a request to a Snowman to “calm down.”
Load balancing allows for redundancies of all instances. That way, when a failure of an instance takes place, any requests that would otherwise be sent to that instance are instead forwarded to the other, still-functional instances.
If you have more than one part in a server that is intended for the same purpose, load balancing becomes possible, allowing work to be evenly divided. For that same reason, we will evenly distribute the water balloons.
Migration helps when services only allow one instance. If that instance fails, the service switches over to a different part of the cluster. If necessary, the entire process can switch over to the other cluster location.
Part of what makes redundancy difficult is the integrated nature of a system. One part is reliant on another part. Availability must be integrated as well. This concept means that downtime does not result due to that reliance or dependency. That’s why, when we get to the mountains, it’s every man for himself.
Patches & Rolling
Rolling within a cluster allows patches to be installed and uninstalled without the need for downtime.
In a cluster, configuration needs to be consistent. When configuration is administered properly, requests are handled in the same way regardless which component is conducting the work. Configurations should also be synchronized, as should our water-balloon defensive maneuvers, and the administration itself should be conducted in a way that optimizes availability.
Clustering & Nodes
As a final note on maintenance of high-availability, let’s take a brief look at the piece from Linux Virtual Server. It underscores the importance of clustering that is similarly advocated in the Oracle article.
Redundancies within a cluster, says the LVS site, allow for redundancy throughout all levels of the system – both hardware and software. The nodes within a cluster can all be running the same operating system and applications. When daemons or nodes fail, if seamless reconfiguration is in place, the additional nodes pick up the slack. We should remember this principle in the mountains, because Terry is coming along, and we all know he’s not great at throwing balloons.
Conclusion & Poem
You can see how extensively the notion of redundancy has been studied and how many technologies have been developed to allow the maximum possible uptime. High-availability, after all, is crucial to allowing businesses to continue to operate, regardless if something goes wrong at the level of the server.
One final poem in parting… This one, as you can imagine, goes out to the Abominable Snowman, and I personally hope he reads and enjoys it:
Hey you, please don’t eat us
We really think you are good-looking
Your political philosophy is sophisticated and respectable
And I heard you’re a whiz at squirrel cooking.
By Kent Roberts