What is Big Data - Is 1 Petabyte considered as Big Data? May be 10 Petabyte?
According to Wikipedia: "big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications ". Ok, but is Hadoop a traditional data processing application or not? You know, it's almost 5 years around…
According to Gartner: "Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making." But then again: what is high-volume and high-velocity? Is 1 Petabyte data set considered as high-volume information asset or not?
Some will say that the size of Big Data is a moving target; others will claim that Big Data is the data size that doesn't fit into one computer memory. And though last two examples are capturing the fluid nature of Big Data they are not sufficiently formal enough.
From the very beginning of computer industry, data sets always grew bigger and bigger and IT departments were always concerned with insufficient resources to support this growth. So what has changed now to become “Big Data” buzz?
It seems that what makes Big Data to be Big Data is not some size threshold and not some velocity threshold but rather the ratio. What ratio? The ratio between data volume, data velocity and hardware available. Should we expect 1000 times faster CPU in next year and 1000 time bigger and faster memory in next year, would we care about 40-60% annual growth rate in data set size? Probably not. So the ratio is the ratio between data growth rate and hardware growth rate. Specifically, CPU performances, storage and memory capacity and storage and memory speed. Assuming this definition, and taking into account hardware growth rate in last 10 years (which is slower than in the 90s) and data growth rate which was very high in the last 10 years, it becomes much clear why 15-20 years ago we didn't hear about Big Data and now we do. So, may be instead of saying Big Data we should say Small Hardware? Or may be both?