Tag Archives: Oracle

What is High-Availability? Part 3 – Additional Problem-Solving

 

English: The SA Forum “Walter’s Moments” carto...

High-availability, as I have discussed in the previous installments of this series, is a concept that has changed and grown over time. In the past, high-availability was the condition exhibited by a man in a dive bar in Duluth, Minnesota, systematically handing out his landscaping business card to all the female patrons with the words, “I have a lot to offer, and I hope you’ll give me a chance with your shrubbery.”

In the age of information technology, however, high-availability has become more reputable. In fact, high-availability is desired by all those conducting business online. It’s the nature of a system with very little downtime.

To review, optimizing an infrastructure for uptime is often wrongly considered to be, simply, an effort at preventing failures from occurring. Per Microsoft, it’s difficult and sometimes impossible to predict when failures will occur. High-availability involves a thorough focus on recovery, decreasing the length of any downtime instances. For this same reason, I run training drills so that when someone knocks my books out of my hands, I can pick them up before many of the other doctoral students notice.

To look at high-availability from a number of different perspectives, we’re looking at articles from Microsoft, Oracle, and Linux Virtual Server. Today, we are continuing to explore the Oracle piece, also briefly noting commentary from the Linux Virtual Server site.

While we review the idea of high-availability, let’s grab the keys to my father’s Cadillac, drive it out into the mountains, and make clucking and whirring noises to attract the Abominable Snowman. Then let’s offer him a fully-loaded bacon double-cheeseburger and tell him he’s the only one who understands us.

Availability: High-Availability Problem Solving, Continued

In the last post, we looked at comments by Oracle on various technologies that can be used to optimize availability. Let’s continue to look at additional safeguards that can be implemented so that a system is less likely to experience downtime. For the same reason, safety, we will wear full body armor on our trip and carry a sack of water balloons to throw at our beloved monster if he becomes enraged.

As a general rule of thumb, redundancy is the core component of recovery. When there are multiple instances operating simultaneously (active-active availability technology) and when additional systemic components are on standby to be activated as needed (active-passive availability technology), failure can, in a sense, become irrelevant. The system remains consistent throughout, just like the snoring soundtrack that will be playing on our boomboxes at home while we are on our critical mission.

Additional Local High-Availability Solutions

Let’s look at a few additional problem-solving tools for use on a local system, courtesy of Oracle.

Routing and state replication

Stateful applications should have the ability to include additional instances of client states. This capacity allows the applications to continue to run smoothly if processes fail that are handling client requests – similarly to a request to a Snowman to “calm down.”

Failover

Load balancing allows for redundancies of all instances. That way, when a failure of an instance takes place, any requests that would otherwise be sent to that instance are instead forwarded to the other, still-functional instances.

Load balancing

If you have more than one part in a server that is intended for the same purpose, load balancing becomes possible, allowing work to be evenly divided. For that same reason, we will evenly distribute the water balloons.

Migration

Migration helps when services only allow one instance. If that instance fails, the service switches over to a different part of the cluster. If necessary, the entire process can switch over to the other cluster location.

High-Availability Integration

Part of what makes redundancy difficult is the integrated nature of a system. One part is reliant on another part. Availability must be integrated as well. This concept means that downtime does not result due to that reliance or dependency. That’s why, when we get to the mountains, it’s every man for himself.

Patches & Rolling

Rolling within a cluster allows patches to be installed and uninstalled without the need for downtime.

Configuration

In a cluster, configuration needs to be consistent. When configuration is administered properly, requests are handled in the same way regardless which component is conducting the work. Configurations should also be synchronized, as should our water-balloon defensive maneuvers, and the administration itself should be conducted in a way that optimizes availability.

Clustering & Nodes

As a final note on maintenance of high-availability, let’s take a brief look at the piece from Linux Virtual Server. It underscores the importance of clustering that is similarly advocated in the Oracle article.

Redundancies within a cluster, says the LVS site, allow for redundancy throughout all levels of the system – both hardware and software. The nodes within a cluster can all be running the same operating system and applications. When daemons or nodes fail, if seamless reconfiguration is in place, the additional nodes pick up the slack. We should remember this principle in the mountains, because Terry is coming along, and we all know he’s not great at throwing balloons.

Conclusion & Poem

You can see how extensively the notion of redundancy has been studied and how many technologies have been developed to allow the maximum possible uptime. High-availability, after all, is crucial to allowing businesses to continue to operate, regardless if something goes wrong at the level of the server.

Again, bear in mind our 100% uptime guarantee. This guarantee is available to all our shared hosting, dedicated server, and VPS clients.

One final poem in parting… This one, as you can imagine, goes out to the Abominable Snowman, and I personally hope he reads and enjoys it:

Hey you, please don’t eat us

We really think you are good-looking

Your political philosophy is sophisticated and respectable

And I heard you’re a whiz at squirrel cooking.

By Kent Roberts

What is High-Availability? Part 2 – Problem-Solving

 

2 node High Availability Cluster network diagram
2 node High Availability Cluster network diagram (Photo credit: Wikipedia)

High-availability, as we learned in the last installment, has changed conceptually since the days of yesteryear and, for that matter, even near-year. It no longer just refers to the full-access, all-hours, 24/7/365 immediate-response policies of a man looking for love in all the wrong places and some of the right ones. It’s no longer about a man with a well-groomed mustache offering shoulder massages at closing time.

No, in the world of computers, high-availability is a completely different matter. Instead, it deals specifically with the uptime of a network. To properly understand uptime, we must consider that it is not merely about eliminating incidences of failure within a network (because, per Microsoft, failures are by their nature unpredictable). Rather, it is also about high rates of recovery so that the system is not affected for an extended period. With sound recovery methods, data delivery remains consistent. That’s why I carry a slide-rule with me to re-straighten my hair part if someone gives me a noogie.

To bolster our understanding of high-availability in this tripartite miniseries, we are assessing the perspectives of Microsoft, Oracle, and Linux Virtual Server. Today, looking specifically at the Oracle article, we will discuss several problem-solving methods.

While we consider high-availability, let’s put on our Easter bonnets and throw eggs at passing cars, focusing especially on the ones with their windows down. We’ll only be 13 once.

Availability: Quick Review

Oracle defines high-availability as “the ability of users to access a system without loss of service.” Really, that seems like a definition of availability. High-availability means that scenario is occurring almost all of the time. Even in a highly redundant system, there will always be occasional errors and glitches. Regardless, a system in which availability is optimized is highly reliable and does not experience very much downtime. A good example of this, according to the women of Austin, Texas, is my reproductive system.

Downtime can be thought of as scheduled and unscheduled. When it is unscheduled, the downtime is due to some type of systemic failure. When it is scheduled, users can be notified that upgrades or other system administration is being conducted (as with a hosting company and its clients, or with a website posting a notice to visitors). “Scheduled downtime typically occurs late at night, when traffic is light, all right, baby, all right,” crooned Barry Manilow.

High-Availability Problem Solving

Various types of problems can of course occur in a system. Types of common failures include those occurring within processors, nodes, and in various forms of media. Human error can also cause failures, as can monkey and camel error. Availability can maintain a high level by both focusing on localized problem-solving as well as methods of recovery in the event of a natural disaster, such as flooding or datacenter technician stampede.

Different sorts of best practices and technological solutions can help to make high-availability a reality. Redundancy, says Oracle, is the most important parameter to enhance availability: “High availability comes from redundant systems and components.” The same parameter applies to the man with the well-groomed mustache mentioned above, as he repeats the same psychosexual sales pitch over and over again, optimizing his systemic redundancy. Looking at solutions for localized high-availability in terms of redundancy splits potential fixes into active-active and active-passive groups.

  1. Active-active availability mechanisms: These mechanisms allow better scalability along with increased availability. Transmissions are duplicated in real time.
  2. Active-passive availability mechanisms: In this scenario, sometimes called cold failover clusters, one system instance is handling requests and the other one is sitting and pondering, running its finger through its hair, waiting patiently to be called into action. It chews gum and looks sullen. Clustering is used to integrate the two instances, with the clustering agent monitoring the active instance and switching over to the passive one as necessary.

Other Local High-Availability Solutions

Other safeguards should be in place to make sure your availability is as reliable as possible. Here are a few examples; we will proceed with more in the final part of this series:

Automatic restart & process death detection

You don’t want the system to continually restart multiple times in a relatively short window. Restarting can lead to additional failure. Technology should be in place to disallow repetitive, automated restarts. The same principle applies to excessively restarting one’s day. You should never get in and out of bed more than two dozen times before proceeding to breakfast.

Processes can die due to systemic errors. If processes are problematic, you do want a restart to be in place to give the process another chance. Don’t give it 10,000 chances though. Processes are greedy about grabbing all the chances.

Clustering

Clustering means that the client computer (PC or other device accessing your system) will consider that part of your system to be one unit. This practice makes processing and administering the system easier. You can have processes clustered together and working on one server or on various servers, with the work divided evenly. It enhances redundancy by spreading out the process. Granola, similarly, is a highly redundant food. It should be eaten at all times when managing a server, even if you aren’t hungry.

Conclusion, Continuation & Poem

Availability and uptime are complex, but there are plenty of solutions out there to make sure that systems are as failsafe as possible. As stated above, I will continue to go over more of the safeguards that can maximize your availability in the final part of this series.

Here’s an eye-opening factoid that you may remember from the last post: we guarantee 100% uptime in our service level agreement (SLA), reimbursing our customers for any exceptions. You like the juice? We’ve got shared hosting, dedicated servers, and VPSs.

Now, finally, on a somber note, I’d like to close with a love poem to a dead process I once knew dearly … Well, maybe it’s not a love poem but a statement of redundancy-related anxiety. Anyway, it’s beautiful:

Process I can’t remember what you were doing

You’ve been dead now for years

Sometimes at night I can’t sleep

Because of my failure fears.

Come back to me, so we can share a club sandwich

While riding a tandem bicycle.

By Kent Roberts