Contents
Traditional Big Data Processing Approach
MapReduce
Word count Problem
Reduce Operation
Data Flow
Scope of Map Reduce
Summary
i
2
Traditional Big Data Processing Approach
3
Map Reduce is a programming framework that
allows us to perform distributed and parallel
processing on large data sets in a distributed
environment.
MapReduce
4
MapReduce
5
Word Counter Problem
6
Reduce
Reduce
Reduce
Reduce Operation
MAP: Input data <key, value> pair
REDUCE: <key, value> pair <result>
Data
Collection: split1
Split the data to
Supply multiple
processors
Data
Collection: split 2
Data
Collection: split n Map
Map
…
…
Map
7
…
A Map Reduce job is a unit of work that the
client wants to be performed
It consists of input data, the map reduce
program and the configuration information
The tasks are scheduled using YARN which
run on nodes in the clusters
If a task fails, it will be automatically
reschedule and run on different node
Data flow
8
Contd...
A good split size is the size of an HDFS block i.e. 128 MB by default
If the number of splits are more then the overhead of managing the
splits and the map task creation begins to dominate the total job
execution time
9
Scope of MapReduce
Pipelined Instruction level
Concurrent Thread level
Service Object level
Indexed File level
Mega Block level
Virtual System Level
Data size: small
Data size: large
10
Summary
We introduced MapReduce programming model for
processing large scale data
We discussed the supporting Hadoop Distributed
File System
The concepts were illustrated using a simple
example
We reviewed some important parts of the source
code for the example.
Relationship to Cloud Computing
11
References
1.Apache Hadoop Tutorial: http://hadoop.apache.org
http://hadoop.apache.org/core/docs/current/mapred_tu
torial.html
2.Dean, J. and Ghemawat, S. 2008. MapReduce:
simplified data processing on large clusters.
Communication of ACM 51, 1 (Jan. 2008), 107-113.
3.Cloudera Videos by Aaron Kimball:
http://www.cloudera.com/hadoop-training-basic
4. http://www.cse.buffalo.edu/faculty/bina/mapreduce.html
12