Data Creation vs. Data Consumption will the real Big Data please stand up
The word around the IT water-cooler is that a sure way to get your IT Budget approved is by simply throwing in the words “Big Data” and “Cloud Computing”. The people who control the chequebook don’t really understand these terms but know that they are important.
Big Data and Cloud Computing are phrases often thrown around by the IT geeks and the media but what do they mean in real terms ? The recently held Huawei Cloud Congress presented the perfect opportunity to demystify these terms.
The Third Platform
In order to understand “Big Data” we have to understand the origins of what is creating the data and where it is coming from which is referred to as the Third IT Platform.
It started with the First IT Platform back 1981, when IBM entered the PC stage. Around the 1990s, the Second IT Platform was born with the Client (the desktop computer) and the Server along with a slow internet connection. This was the first time that data was stored not on the hard drive of the computer but on a remote server.
The Third Platform came in the mid 2000 with services like Gmail from Google, EC2 from Amazon and Facebook & Twitter continued their massive worldwide growth. This is the era where companies no longer had to deploy their own data centres to cope with the massive information on their own networks but could deploy multiple data centres and delegate the computing functions to these centres to be handled “in the cloud”. Data Centres are becoming more agile and intelligent as data analysis needs to happen in real-time and not looking back “post the event”.
The Third Platform is where the terms like Big Data, Cloud, Mobility and Social Networking began.
Data Broken Down
Big Data refers to the massive amounts of data that is flowing through company’s networks and the internet. But what is BIG ?
I was keen to find out just how much data was generated in 2013. The best person to answer this question was David Reinsel who is IDC’s Group vice president of storage, semiconductor, GRC infrastructure, and pricing. David shared IDC’s stats from their 2013 research.
In 2013, 1.8 Exabytes of storage capacity was purchased world wide and the “industry” spent $3.8 billion on 28 Exabyte of capacity. To put that into perspective, it is said that all the words ever spoken by human beings could be stored in approximately 5 Exabytes of data.
This data was created largely thanks to the explosion and proliferation of Mobile. In 2013, 227 million tablets and 1.9 Billion phones were shipped world wide. Approximately $50 Billion was spent on mobile chipsets to enable this mobile infrastructure.
Social Networking took off into the stratosphere the moment that people could update their Facebook status from their mobile phones and were no longer tethered to their computers. There are now over 7000 million expression per day on social networking sites. These are posts of people sharing info and expressing their feelings of likes and dislikes.
David recounts that in 2013, the four major social networking sites generated these staggering amounts of data:
- Facebook: Facebook has 1.3 Billion active users who generate 60 million status updates per day, 10 billion messages are sent per day, 350 million photos uploaded each day and therefore Facebook has a storage capacity of over 300 petabytes adding 600 terabytes per day which is 219 petabytes per year (7 petabytes of storage per month are just for the photos user upload)
- YouTube: There is 100 hours of video being uploaded every minute. After compression, the average storage of the video equated to 12.5 megabyte per video which is 39 petabytes per year.
- Twitter: There are 500 million tweets per day. At 200 bytes per tweet, this equated to 0.37 petabytes per year.
- Instagram: There are 60 million images uploaded per day which is roughly 4.4 petabytes per year
These four social networks alone, who are just some social networks and by no means all, created in 2013 around 263 petabytes of data in one year.
But that is not the entire story. When data is being created it is also being consumed. Consumption of data is where BIG Data really happens.
According to the IDC, the data consumption in 2013 from the same four networks was made up of the following:
- Facebook: 496 billion page views per month which equates to 2380 petabytes per year
- YouTube: has 6 million hours of video viewed per month which equates to 54 000 petabytes per year
- Twitter: has 11.2 billion Page views per month which equates to 41 petabytes per year
- Instagram: has 5 billion views per month which equates to 24 petabytes per year
Totalling these four network results in 56 445 petabytes of data being consumed. This figure is clearly MUCH bigger than the 263 petabytes being generated.
Put another way, for every Petabyte of data created, there is around 210 Petabytes being consumed. However David adds that the ratio changes dramatically when looking at a services that streams movies such as Netflix. Netflix stores 3.1 Petabytes of movies but serves 1 billion hours video per month which equates to 27 000 Petabytes per year. On this alone, for every 1 Petabyte of data created, there are 8000 Petabytes of data being consumed.
David calls this the “era of Hyper consumption which will drive storage consumption to be fast, converged and efficient”.
Big Data is going mainstream as we store more unstructured data and add metadata for everything that is being created. This includes the Internet Of Things (IoT) as we gather data from “things” such as buildings, infrastructures. According to the IDC the IoT is a “network or networks with uniquely identifiable end point that connect without human intervention via IP connectivity whether its wireless of wired, local or global” In other words, it is autonomous devices that can be connected to the internet that request and send data.
It is estimated that there will be 30 Billion connected devices by year 2020. Machine generated data was estimated to be a third of the digital universe in 2013, however it will be over 50% by 2020.
So where is the Big Data market going ?
The real money makers in the next three years will be companies who focus on Integrated Systems. The forecast is that Hardware Vendors market will grow from $5.4 Billion in 2013 to $14.3 Billion in 2017, Networking Vendors will see a growth from $0.2 Billion in 2013 to $0.8 Billion in 2017, Server Vendors will grow from $2.3 Billion in 2013 to &6.7 Billion in 2017 and Storage Vendors will grow from $2.9 Billion in 2013 to $6.8 Billion in 2017.
Big Data is growing 26% compounded annually which is 6 times faster growth than the overall IT market.
Big Data comprises of infrastructure, software and services. In 20104 Infrastructure accounts for 48% of the big data market and is the largest segment and the fastest growing sector worth $8 billion market in 2014.
In 2014 the Software market is worth around 44% of the market and is valued at $3.5 billion.
Huawei has set its sights to serve this market. This data needs to be analysed as fast as possible to deliver critical services on the go. Huawei’s innovation into their Big Data storage system, known as OceanStor , allowed Huawei to be 3 times faster than the industrial performance benchmark. This system has the ability to deal with 200 GB per second.
When it comes to the Software of Big Data and the management of volumes of data, Huawei’s converged infrastructure FusionCube is the answer. Tested internally for Huawei’s own needs, it was able to produce their own Report & Analysis report in 5 hours using the power of 1500 concurrent users in 1 rack compared to 8 hours on a previous system that could handle only 600 concurrent users and 2 racks. These results were independently verified.
So it seems that the old adage of “Time is Money” still holds true. As long as we continue to generate and consume data at these rates, we will need systems to handle these volumes of data in real time.