If youâ€™re plugged into the tech industry at all â€“ and weâ€™ll just go ahead and assume that you are if youâ€™re one of the wonderful readers of our data-centric blog here â€“ then you probably followed Google I/O 2014 the other week. In case you donâ€™t have any idea what weâ€™re talking about, though, now that Google I/O is the yearly conference focused on developers that the search giant holds in San Francisco. There was a lot of talk about Android, Google Play and wearables. There were also two entirely separate and seemingly unconnected protests â€“ one against Googleâ€™s alleged building of killer artificial intelligence and the other targeting a Mountain View executive evicting tenants from some property he owns.
If you want to hear more about any of that stuff, well, you can go to the source and Google it. If, however, you want to know what the big takeaways were about cloud hosting, then read on. Itâ€™s true that the vast majority of the I/O conference was dominated by consumer-targeted technology. We heard about ways Android will be better, ways Android will interact with your car better, ways Android will work better with your Chromebook and ways Android will work better with your TV. So there was a whole lot of Droid.
Squeezed in between the protests and the Android love fests, though, were a couple of enterprise-scale data handling and cloud-computing nuggets of information. So the entire thing wasnâ€™t about end users, even if the majority of it was.
Getting the Cloud Flowing
The Zen Masters of Search rolled out something theyâ€™re calling â€œCloud Dataflow.â€ One of Googleâ€™s earliest attempts at handling large amounts of unstructured data was MapReduce, a programming model and an associated implementation used for processing issues spread across enormous groups of data by taking advantage of large numbers of nodes (computers) known as clusters or grids. MapReduce was a hit. It ended up becoming widely adopted by those who needed to store, sort and analyze data.
It has not been without its critics. Some computer scientists claimed that its interface was too low-level and rejected the idea that it truly represented the paradigm shift that many bandwagon jumpers claimed it did. Proponents state that these claims are nonsensical and that they miss the entire point of MapReduce, i.e.: it wasnâ€™t ever meant to be a database.
In either case, Google execs had a message for both sides of the argument at I/O 2014: MapReduce was so 2004. Its batch-oriented nature has no place in the modern IT world in which there is a strong need for a system that is able to handle both large amounts of data designated for a scheduled batch process and an ad hoc stream of random data sets. Enter Dataflow, Googleâ€™s attempt at moving in on the public cloud service sector with a two-in-one data-sorting and analyzing system.
Another Google innovation, FlumeJava, runs underneath DataFlow. FlumeJava has the ability to apply â€œa modest number of operationsâ€ on parallel data streams. It can construct an execution plan instead of simply attempting to embiggen a plan thatâ€™s shown itself to be unequal to a data stream increase.
Information Week offered a convincing argument for why DataFlow is a big deal. Currently, users might rely on Amazon Web Services Elastic MapReduce for batch process while using Kinesis for real-time streaming of data. Over at Google, youâ€™ll be able to use Cloud Dataflow to accomplish both. If it works as well as Google says it does, itâ€™s easy to see why customers would choose Dataflow instead of going with a pair of other services to do the same thing.
What Is CloudFlow Capable of?
Google Senior VP of Engineering Urs Holzle was the one to show the world DataFlow for the first time. Holzle said DataFlow is able to construct parallel pipelines to route data through a transformation and analysis system no matter how big or small the data stream is. It will work as a software development kit and a managed service allowing users to create whatever data transformation and capture process they need.
So what the heck does that mean for the layman? Google was more than happy to demonstrate. It grabbed a stream of tweets related to the FIFA World Cup games in Brazil and turned the data to JSON object data before relying on a Twitter API providing the core data extraction to transform it and finally analyzed it to decipher fansâ€™ feelings with the help of third-party service Alchemy.
More than five million tweets were analyzed prior to the conference, but that was only the beginning. It then added 402 Twitter records every second to figure out that sentiment for Brazil was down following the host nationâ€™s scoring a goal in its opening match against underdog Croatia.
Weâ€™ve blogged multiple times in the past about how big data is everywhere these days, but the amount of it often becomes too overwhelming for anyone to process and use it in any meaningful way. Holzle believes that finding meanings in data streams through the use of DataFlow will help users get a better handle of what their customers want or what the public at large finds appealing. Googleâ€™s new toy “handles the scaling, does the scheduling, deploys the virtual machines, and does the monitoring for you.”
MapReduce â€œwould be too cumbersome for the task,â€ Holzle stated. “Information is being generated at an incredible rate. We want you to be able to analyze that information without worrying about scalability,”
Although Amazon may not agree, Holzle says that the Google Cloud Platform is the leader in both price and performance among infrastructure-as-a-service providers. If nothing else, this could signal the continuation of the aggressive cloud price war among the big boys, with Microsoft having recently shown a willingness to cut prices every bit as much as Amazon and Google. Of course, cloud pricing warfare could be one of several signals that the cloud is in for big changes.
Image Source: Tableau
Find out more about Nick Santangelo on Google Plus