INCONSISTENCIES IN BIG DATA 1 Prepared by, Minu Joseph Guided by, Mr. Thomas Varghese
Contents Introduction . Problem Statement. 3V’s Big data. Defining Big data. Dimensions of big data. Sources, applications of big data. Inconsistencies in big data. Inconsistency induced learning. Conclusion. References. 2
Introduction A torrent of data is generated and captured in digital form due to advancement in science and technology. Everything we do is increasingly leaving a digital trace. Large data sets which are so large and complex that traditional data processing applications are inadequate. 3
Problem Statement Big Data-The next big thing in IT industry. Classification of big data inconsistencies. Big Data and Big Data analysis in terms of issues and challenges. Inconsistency Induced Learning- A tool to turn big data inconsistencies into helpful formulas for better analysis of results. 4
5
Big Data Big data can be described by: Volume Velocity Variety Variability Veracity Complexity 6
What is BIG DATA? 7
8
Dimensions In Big Data 9
10
11
Levels of Knowledge 12
INCONSITENCIES IN BIG DATA Temporal Spatial Text Functional Dependency 13
Temporal Inconsistencies Conflicting information. Data items with conflicting circumstances may coincide or overlap in time. SRS often contain inconsistent information. Inconsistent information affects the correctness and performance of the system. Due to concurrent programming errors Therac-25(1985-1987) lead to 6 accidents. 14
List of temporal inconsistencies 15
Spatial Inconsistencies Happens in datasets which include geometric or spatial dimensions. Traditional DB systems are enhanced to include spatially referenced data. Spatial inconsistencies can arise from Geometric representation of objects Spatial relationship between objects Aggregation of composite objects. 16
Spatial Inconsistencies contd.. 17
Text Inconsistencies Inconsistencies found in unstructured natural l anguage text. Data generated from social media, blogs, emails etc. If two texts are referring to same event or entity they are said to be of co-reference. Contradiction Detection detects text inconsistencies and has many applications. 18
Text Inconsistencies contd.. 19
Functional Dependency Inconsistency When certain attribute values are equal, then other attribute values must also be equal. Many big databases are stored , aggregated and cleaned through the help of RDBMS. Here Functional dependencies play an important role in enforcing the integrity constraints for the database. 20
Functional Dependency Inconsistency contd … 21 Variation of Functional Dependencies will result in inconsistencies in data and information.
Inconsistency Induced Learning Improves data quality Helps to enhance big data applications. Accommodates lifelong learning by allowing successive learning episodes to be triggered through inconsistencies an agent encounters during its problem solving episodes. Basic idea is to identify the cause of inconsistency and then apply cause specific heuristics to resolve inconsistencies. 22
Conclusion Multidimensional issues and challenges in big data and big data analysis. Types of inconsistencies. How to improve quality of big data analysis. 23
References www.slideshare.com dl.acm.org www.ieeexplore.ieee.org D. Zhang, On Temporal Properties of Knowledge Base Inconsistency . Springer Transactions on Computational Science. M. Schroeck , R. Shockley, J. Smart, D. Romero-Morales, and P . Tufano , Analytics: the real-world use of big data: how innovative enterprises extract value from uncertain data, Executive Report, IBM Institute for Business Value and Said Business School at the University of Oxford. Nasrin Irshad Hussain ,Big Data,www.slideshare.com 24