This presentation explains how to build a simple map reduce algorithm step by step.
Size: 946.27 KB
Language: en
Added: Feb 22, 2015
Slides: 16 pages
Slide Content
A Hands-on Introduction to MapReduce in Python David Massart, PhD
Who Am I ?
Outline Set-up and requirements Counting words Limitations Map / Reduce Mapping Shuffling Reducing Hadoop
Environment Set-up Required Unix-like shell Linux Mac OS X Windows + Cygwin Python (e.g., anaconda) Good to have Java 8 Hadoop 2.6
Moby Dick by Herman Melville Download Moby Dick: wget https://www.gutenberg.org/cache/epub/2701/pg2701. txt Rename it input.txt : mv pg2701.txt input.txt
c at input.txt
Counting Words
./ counter.py < input.txt
Limitations Processing time is, at best, proportional to the size of the text Actually, p erformance decreases with the size of the dictionary Very large texts can require more than one disk
MapReduce , Part 1: Mapping
./ mapper.py < input.txt
MapReduce , Part 2: Shuffling Redistribute data based on the output keys produced by the " mapper” So that all data belonging to one are grouped together