Map reduce in BIG DATA

3,570 views 13 slides Apr 19, 2019
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Big Data Analytics Map reduce


Slide Content

MapReduce
1

Submitted By
GAURAV BISWAS

Contents
Traditional Big Data Processing Approach
MapReduce
Word count Problem
Reduce Operation
Data Flow
Scope of Map Reduce
Summary
i
2

Traditional Big Data Processing Approach
3

Map Reduce is a programming framework that
allows us to perform distributed and parallel
processing on large data sets in a distributed
environment.
MapReduce
4

MapReduce
5

Word Counter Problem
6

Reduce
Reduce
Reduce
Reduce Operation
MAP: Input data  <key, value> pair
REDUCE: <key, value> pair  <result>
Data
Collection: split1
Split the data to
Supply multiple
processors

Data
Collection: split 2
Data
Collection: split n Map

Map


Map
7

A Map Reduce job is a unit of work that the
client wants to be performed
It consists of input data, the map reduce
program and the configuration information
The tasks are scheduled using YARN which
run on nodes in the clusters
If a task fails, it will be automatically
reschedule and run on different node
Data flow
8

Contd...
A good split size is the size of an HDFS block i.e. 128 MB by default
If the number of splits are more then the overhead of managing the
splits and the map task creation begins to dominate the total job
execution time
9

Scope of MapReduce
Pipelined Instruction level
Concurrent Thread level
Service Object level
Indexed File level
Mega Block level
Virtual System Level
Data size: small
Data size: large
10

Summary
We introduced MapReduce programming model for
processing large scale data
We discussed the supporting Hadoop Distributed
File System
The concepts were illustrated using a simple
example
We reviewed some important parts of the source
code for the example.
Relationship to Cloud Computing
11

References
1.Apache Hadoop Tutorial: http://hadoop.apache.org
http://hadoop.apache.org/core/docs/current/mapred_tu
torial.html
2.Dean, J. and Ghemawat, S. 2008. MapReduce:
simplified data processing on large clusters.
Communication of ACM 51, 1 (Jan. 2008), 107-113.
3.Cloudera Videos by Aaron Kimball:
http://www.cloudera.com/hadoop-training-basic
4. http://www.cse.buffalo.edu/faculty/bina/mapreduce.html
12

Thank you
Tags