Six Hours of Downtime: Is the Tag #AzureFAIL or #AzureTITSUP?


Microsoft Stock - Azure Failure

Microsoft suffered a major failure starting at 00:52 GMT on November 19. The incident sparked at least one funny headline in the popular press, with London’s The Register calling the worldwide outage of Azure a “TITSUP cloud FAIL” – a turn of phrase using military slang that abbreviates a Total Inability to Support Usual Performance. The opinionated Register report described global users of Azure storage, cloud servers, SQL databases, and directory service Active Directory as “sucker-punched” by the incident.

Azure Meltdown

News reporters obviously do their best to be objective, but this outage should be considered unacceptable from an organization that charges a premium for its reputation and reliability.

Faults that occurred within Azure effectively shut down thousands of independent websites and made parts of Microsoft’s own site inaccessible.

An official statement from Azure notes that the errors were experienced worldwide. The corporation’s systems in Europe experienced longer downtime than anywhere else.

Two of the computing company’s top services, Office 365 (business applications) and Xbox Live (interactive game platform) were disrupted.

Undoubtedly, this massive failure will not help with Azure sales.

As the BBC reports, Microsoft – like IBM and Google – is trying to oust AWS from the top position in cloud services.

“Their pitch,” says the BBC, “is that it is more efficient for companies to rent computing power from a large tech firm than owning and managing their own computer servers or going to a smaller provider.”

Furthermore, Azure has a 99.9% uptime guarantee. What about that .1% of the time? What exactly are businesses supposed to do then? 99.9% sounds great until your system goes down for six hours. (Note that ours is a 100% guarantee.)

When the BBC published its coverage, the Microsoft website stated that access to some services was still spotty, including storage, site data software, and general administration.

Does Six Hours Really Matter?

Azure serves as a backbone for numerous large companies, such as Toyota, Apple, and eBay.

It would seem that those corporations are getting VIP versions of the system. A cursory BBC analysis found that the companies affected by this outage were SMB’s, such as social media optimization startup SocialSafe.

Julian Ranger, the company’s chief executive, said that the impact of this failure was serious for a company that relies heavily on its website, especially ones that are 100% virtual. Ranger described the event as “hugely disruptive” because no interested parties could get to their SMO tools, amounting to substantial lost revenue.

Ranger was particularly disturbed by the failure because his organization switched to Azure from another unreliable provider.

One of the primary ideas behind Azure is that the reliability is supposed to be incredibly strong, said Ranger, because your software can run in multiple locations – the basic idea behind the distributed virtualization structure commonly referred to as cloud computing.

On November 19, though, the SocialSafe site was “completely out.”

The consistency and reliability of many businesses that use Microsoft for hosting was surely called into question by customers unable to load the site and perhaps unaware of the Azure issue.

Some companies did not have problems with their public-facing site but were unable to access applications.

Viva Zorggroep, a healthcare company in the Netherlands with a workforce of 4000, noted that it was unable to get to the Microsoft software-as-a-service solution used by its finance, HR, and tech-support departments. A representative of the company, Dave Thijssen, said that employees were unable to get into Office 365.

Thijssen noted that no one was able to send or receive email or check their calendars. More disruptively, they were unable to retrieve or edit their documents.

Related to the general issue of customer perception described above, Thijssen said that it was challenging to figure out how to notify their users of the issue since the majority of messaging services, including email, were down.

Office 365 is a way to use the entire Microsoft Office suite via cloud from anywhere. Clearly it has its disadvantages.

Thijssen said that although Microsoft was surely concerned with this outage, it represented “yet another blow to our SLA [service-level agreement]” and a public relations disaster.

Furthermore, since Microsoft is such a big-name company, when they fail, it makes the rest of the cloud hosting industry appear guilty by association. Thijssen said that with this failure, his colleagues were becoming concerned that the technology might not be sophisticated enough for enterprises.

The truth is that Microsoft may just not be investing enough in risk analysis and third-party verifications.

Stock Market Mega-Fail & Superior Solutions

As the BBC notes, this Azure outage nightmare came at a particularly poor time for the computing giant. Chris Green of consultancy Davies Murphy Group said that the timing was terrible because right now, Microsoft “is pushing and marketing Azure so obviously.”

Regardless of how Microsoft is trying to position itself, there is a reason the stock plummeted on Microsoft November 19: an outage like this does not exactly inspire confidence.

Plus, why would an organization with a huge name that we could expect to be incredibly consistent only offer a 99.9% uptime SLA? Why choose Microsoft when you could more affordably choose a 100% uptime SLA with us?

By Kent Roberts

Screenshot via Google Finance