Chapter 1 big data

PragatikKhade 2,731 views 21 slides Jul 14, 2021
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Introduction to Big Data


Slide Content

What is Data? The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

Ch.1 Introduction to Big Data

What is Big Data ? Big Data is also  data  but with a  huge size . Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time.  That means, Data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

From where Big Data come

Characteristics Of Big Data

Volume –  The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence,  'Volume'  is one characteristic which needs to be considered while dealing with Big Data. (ii) Variety –  The next aspect of Big Data is its  variety . Variety refers to heterogeneous sources and the nature of data, both structured unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data. (iii) Velocity –  The term  'velocity'  refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data. Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors,  Mobile  devices, etc. The flow of data is massive and continuous.

(iv) Variability –  This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. (iv) Value –  After having the 4 V’s into account there comes one more V which stands for Value!. The bulk of Data having no Value is of no good to the company, unless you turn it into something useful. Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information. Hence, you can state that Value! is the most important V of all the 5V’s.

Tools Of Big Data No SQL:- Databases MongoDb , CouchDB , Cassandra Redis Bigtable , Hbase , Hypertable , Voldermort,Riak,ZooKeeper . Map Reduce :- Hadoop , Hive, Pig , Cascading , Cascalog , mrjob , Caffeine, S4, MapR , Acunu , Flume , Kafka , Azkaban, Oozie,Greenplum . Storage :-S3, Hadoop Distributed File System . Server:-EC2, Google App Engine , Elastic , B eanstalk , Heroku . Processing :- R, Yahoo ! Pipes , Mechanical Turk , Solr / Lucene , ElasticSearch , Datameer , Bigsheets , Tinkerpop

T ypes of Big Data(Digital Data) Digital data can be classified into three forms as shown in following figure .

Structered - Structured is one of the types of big data. we mean data that can be processed, stored, and retrieved in a fixed format. It refers to highly organized information that can be readily and seamlessly stored and accessed from a database by simple search engine algorithms. For e.g :- For instance, the employee table in a company database will be structured as the employee details, their job positions, their salaries, etc.,  will be present in an organized manner. 

UnStructered - UnStructured is one of the types of big data. we mean data that can be processed, stored, and retrieved in a which is not in fixed format. It refers to highly unorganized information that can be readily and seamlessly stored and accessed from a database. This makes it very difficult and time-consuming to process and analyze unstructured data. For e.g :- Emails, facebook ,  will be present in an unorganized manner. 

Semi- Structered - Semi structured is the third type of big data Semi-structured data pertains to the data containing both the formats mentioned above, that is, structured and unstructured data.

Differnce between Types Of Big Data

Big Data Analytics Big Data analytics is a process used to extract usefull insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things.

Big Data Analytics

Big Data Applications

Thank You
Tags