BIT301 Advance Database – Big Data Assignment Help
BIT301 – Advance Database
Definition
Big data is defined as a data which is very complex and large that processing it becomes a challenge. It can be both; structured or unstructured data set. Processing such a data set is very difficult; either by using traditional processing tools, applications or even through “database management tools” There are various challenges involved in data processing which include storage, sharing and transfer of data (White and Tom, 2012). Additionally big data is also used to refer to tools that are required to organize and handle large and complex data set. Its size is beyond the software’s ability to capture, manage and process the data set (Snijders, Matzat and Reips, 2012). Furthermore data challenges are defines as three dimensional based on volume, velocity and variety (Laney and Douglas, 2001). Based on this three dimensional view Gartner (2001) defined big data as a data which is “high volume, high velocity and high variety” data set which is difficult to process through traditional ways. In short this data is beyond processing capabilities of database.
Architecture of Big Data
There are various models that describe architecture of big data, and explain how this problem can be solved. “MIKE2.0” is widely used to manage huge sets of data, it explains how big data can be organized, arranged and managed. It handles big data by variation and combination of data into sets, in order to reduce complexity and manage records which in turn will reduce errors in records (Mike, 2013).
Similarly “MapReduce” is also widely used and is based upon big data architecture. It processes huge amount of dataset, here queries are further split and are distributed into various sub parts, and finally results are delivered by gathering and arranging data (Webster and John, 2004).
Technologies
Since big data is huge data; exceptional and unique technologies are required to process and make sense of it. McKinsey (2011) explain that there are different appropriate technologies some of which are mentioned in below figure
In addition to above mentioned technologies there are many more technologies as well like genetic algorithms, machine learning, natural language processing and many more. Tensor based compution is also used widely to arrange and interpret big data (Future Directions in Tensor-Based Computation and Modeling, 2009).
How big data affect the Market
Big data has many impacts on the market and does affect the market and its functions. For example it has significantly increased demand of Information Management Specialists in the market. Similarly companies are spending huge amount of money on various software in order to arrange big data. Companies like Dell, HP, Microsoft and IBM have spent huge amounts for data management, in order to improve the analytics and decision making (Data, data everywhere, 2010). Moreover developed countries are also employing “data intensive technologies”. The world’s effectual ability to swap information through telecommunication networks has increased from 1986 to 2013 by many folds, and is predicted that traffic will be 667 exabytes by 2013 hence big data has changed the whole scenario of information processing and how businesses work (Data, data everywhere, 2010).
Examples
Following are two examples of big data, consider this data has been gathered from various sources including; social media, web site, sales, customer contact, mobile and many others.
- Data in petabytes i.e. 1,024 terabytes
- Data in exabytes i.e. 1,024 petabytes
Both above mentioned examples contain huge data records in billions and trillions of people, now this data is difficult to manage, manipulate, analyze and interpret. Also it is unstructured data and does not make any sense. While arranging this data it will very difficult to understand it, manipulate and interpret it. Hence this is a big data which is very difficult to understand. Other examples can be medical records or internet search indexing.
Importance of big data
Importance of big data is clearly indicated by the fact that it is now a reality; data has become so complex that managing it properly is critical for decision making and for achieving the ultimate goal. 3V’s mentioned above i.e. volume, variety and velocity has increased significantly and big data is coming into organizations which require effective management. Today data builds surrounded by manifold data stores in plentiful formats; you may find your organization has accumulated billions of rows of data with hundreds of millions of data grouping. So the solution to the big data challenge then becomes obvious – big data requires high-performance analytics to process and figure out what’s important and what’s not, hence the importance is clearly indicated (Data, data everywhere, 2010).
Disadvantages of big data
Following are the disadvantages of big data
- Increased level of ambiguity
- You have data but you are unsure about how to use it
- Delayed decision making
- Increase level of uncertainty
- Difficultly to access data
- Difficulty in managing the data
- Difficulty in using data effectively
Conclusion
Big data is a fact in today’s world and as mentioned above, it is impacting markets and businesses in various ways. This data being three dimensional is much more complex to understand and interpret. There are various technologies that can be used to arrange and manage big data. With its importance and clearly mentioned disadvantages, it is important to understand what big data is and how it can be managed effectively.
References
Data, data everywhere (2010) The Economist Retrieved 9 December 2012.
Future Directions in Tensor-Based Computation and Modeling, 2009 . “A Survey of Multilinear Subspace Learning for Tensor Data”.
Gartner, 2012, “The Importance of ‘Big Data’: A Definition”. Retrieved 21 June 2012.
Laney and Douglas. “3D Data Management: Controlling Data Volume, Velocity and Variety”. Gartner. Retrieved 6 February 2001.
MIKE 2.0 (2013) “Big data definition” The open source standard for management. Retrieved August 2013.
Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of Internet. International Journal of Internet Science, 7, 1-5
Webster and John (2004) “MapReduce: Simplified Data Processing on Large Clusters” Search Storage 2004.
White and Tom, 2012. Hadoop: The Definitive Guide. O’Reilly Media. p. 3