Map reduce in BIG DATA

3,570 views 13 slides Apr 19, 2019

Slide 1 of 13

About This Presentation

Big Data Analytics Map reduce

Size: 657.54 KB

Language: en

Added: Apr 19, 2019

Slides: 13 pages

Slide Content

MapReduce
1

Submitted By
GAURAV BISWAS

Contents
Traditional Big Data Processing Approach
MapReduce
Word count Problem
Reduce Operation
Data Flow
Scope of Map Reduce
Summary
i
2

Traditional Big Data Processing Approach
3

Map Reduce is a programming framework that
allows us to perform distributed and parallel
processing on large data sets in a distributed
environment.
MapReduce
4

MapReduce
5

Word Counter Problem
6

Reduce
Reduce
Reduce
Reduce Operation
MAP: Input data  <key, value> pair
REDUCE: <key, value> pair  <result>
Data
Collection: split1
Split the data to
Supply multiple
processors

Data
Collection: split 2
Data
Collection: split n Map

Map
…
…
Map
7
…

A Map Reduce job is a unit of work that the
client wants to be performed
It consists of input data, the map reduce
program and the configuration information
The tasks are scheduled using YARN which
run on nodes in the clusters
If a task fails, it will be automatically
reschedule and run on different node
Data flow
8

Contd...
A good split size is the size of an HDFS block i.e. 128 MB by default
If the number of splits are more then the overhead of managing the
splits and the map task creation begins to dominate the total job
execution time
9

Scope of MapReduce
Pipelined Instruction level
Concurrent Thread level
Service Object level
Indexed File level
Mega Block level
Virtual System Level
Data size: small
Data size: large
10

Summary
We introduced MapReduce programming model for
processing large scale data
We discussed the supporting Hadoop Distributed
File System
The concepts were illustrated using a simple
example
We reviewed some important parts of the source
code for the example.
Relationship to Cloud Computing
11

References
1.Apache Hadoop Tutorial: http://hadoop.apache.org
http://hadoop.apache.org/core/docs/current/mapred_tu
torial.html
2.Dean, J. and Ghemawat, S. 2008. MapReduce:
simplified data processing on large clusters.
Communication of ACM 51, 1 (Jan. 2008), 107-113.
3.Cloudera Videos by Aaron Kimball:
http://www.cloudera.com/hadoop-training-basic
4. http://www.cse.buffalo.edu/faculty/bina/mapreduce.html
12

Map reduce in BIG DATA

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Map reduce in BIG DATA

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx