Every day, we create 2.5 quintillion bytes of data. This data comes from everywhere i.e. from sensors, posts to social media sites, digital pictures and videos, purchase transaction records, cell phone GPS signals, etc. If the data generation is increasing day by day, then it will become more complex to store and process these datasets via traditional approaches. The graph shows the growth of digital data compared to :
Defining Big Data:
Big data means a huge amount of data that is beyond the processing capability of the traditional data management system to manage and analyze the data in a specified time span. Big Data comes from many sources, some of them are digital media, online transaction records, cellphone signals, etc.
For example, if you have 10TB (TeraByte) of image files, upon which some processing needs to be done, such as resizing and enhancement of images within the given time frames. If you use the traditional systems to perform this task, you would not able to accomplish this task within the given timeframe because the computing resources of the traditional system would not be sufficient to accomplish this task. So, this 10 TB of data is referred to as Big Data.
The main challenge of Big Data is storing and processing the data at a specified time span. Performing this task using traditional conventional methods is not good practice. In order to get rid of this traditional approach to analyze and store the Big data, Hadoop technology has been developed.