- Cloud Providers Treating Customers Like Barflies
- Scout’s Explanation of Steal Time at AWS
- How Much is Steal Time Affecting You?
- Movie Tickets Analogy
- Importance for Web Apps
- 4-in-1 Cloud VM
Cloud Providers Treating Customers Like Barflies
Almost all cloud service providers are overselling their resources to a ridiculous degree. They don’t provide real resource guarantees because they don’t want to: it’s easier to pack in as many customers as they can, especially since that terrible service quality is so standard in the industry.
The quality of what you are getting in a cloud service at most providers is like bathrooms at dive bars (enter at your own risk). If everyone in the dive bar market is agreeing not to raise their cleanliness level into the Not Completely Disgusting range, customers are less irritated when they walk into another cesspool. Dive bars are designed for barflies: you can get the bar atmosphere for cheap, but don’t expect to be treated with respect.
The basic philosophy shared by many cloud providers and dive bar owners is: “Everyone else is providing something awful, so why don’t we?”
Just as the result of that philosophy leads to dirty bathrooms in the dive bar, it leads to steal time at a cloud provider. Resources aren’t guaranteed, so what you purchase represents the maximum you can experience during a burst. Your resource availability is inconsistent because sometimes the guy next to you is using the CPU that you need.
The alternative to that scenario is to provide guaranteed resources so that there is no ambiguity, everything is consistent, and you can achieve the equivalent of four machines in one. After all, we’ve all read plenty of articles about how great cloud is. How about your provider doesn’t water it down so that you can actually leverage its full potential? Like walking into a bathroom where you aren’t afraid of catching malaria, cloud without the threat of CPU theft is highly preferable.
Scout’s Explanation of Steal Time at AWS
Before we get into our 4-in-1 VM model, let’s look at one tech company’s discussion of steal time – so rampant that they speak of it as if it’s a given.
“If you deploy to a virtualized environment (for example, Amazon EC2), steal time is a metric you’ll want to watch,” says Derek of application monitoring company Scout. “If this number is high, performance can suffer significantly.”
As a feature, steal time is calculated as a percentage – the portion of time the CPU for your VM is waiting for a physical CPU to become available because another VM’s CPU is using it. In other words, it’s standing in line – personally, one of my favorite activities.
The basic way that cloud works is that you and other VM customers share resources. That’s not a problem if there are guarantees. In the case of other providers, though, your neighbor can burst into your CPU, and all you can do is stand there, humming a pretty tune.
How Much is Steal Time Affecting You?
When someone burglarizes your house, you file a police report and report it to your homeowner’s insurance. In order for the police and your insurance company to know what’s been taken, you need a list of stolen property. Similarly, you want to assess the damage with steal time if your cloud provider allows it. How much is being taken?
On Linux, run the command top.
In terms of CPU, you will see a line that lists percent idle (%id), percent I/O wait (%wa), and percent steal time (%st). “If %id is low, the CPU is working hard and doesn’t have much excess capacity,” Derek explains. “If %wa is high, the CPU is ready to run, but is waiting on I/O access to complete.” The final metric gives you your steal time.
Movie Tickets Analogy
Derek gives the example of attending a big Hollywood movie. If you go to a blockbuster movie that is likely be sold out, you will have a similar experience as you have at a cloud provider where steal time is rampant.
There is one person selling tickets, and two lines. This is how the steal time metric would apply to your situation when you want to get into the movie:
- 0% steal time – You are going to the movies on a weekday afternoon. Although the ticket clerk is alternating to sell tickets to each of the two lines, you don’t have to wait at all.
- 50% steal time – You are going to the movies on a weekend night. Half of the time, you will have to wait for the ticket clerk while someone else receives service instead.
- 100% steal time – You go on a weekend night, and the cash register is not working. Everything is stopped.
Importance for Web Apps
Derek notes that getting rid of steal time is particularly critical for web applications. “For tasks that need to be performed in real-time, like rapidly serving many web requests,” Derek says, “a 4x decrease in performance can cause major backups in request queues, which can lead to outages.”
4-in-1 Cloud VM
Consider this scenario for a contrast: when your cloud provider does not oversell resources, it is equal to TWO dedicated servers set up in a redundant high-availability, real-time duplication fashion (itself expensive and time-consuming to set up
and maintain) sitting behind TWO redundant load balancers.
In other words, by buying just one cloud VM from that true 100% HA provider (us), you get the equivalent of four dedicated devices; two redundant load balancers (LBs) and two redundant/real-time duplicated dedicated servers. If you get our Dedicated Cloud, then that is equivalent to multiple such sets of 2 servers and 2 LBs, one for each VM that you create.
How do you know you’re protected? We never, ever oversell, as explained on our cloud page: “Quantities are limited so once the servers are full, you can request to be added to a wait list until more servers are brought online.”
By Kent Roberts
Image via Flickr user *sax