Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
4,890 views
25 slides
Apr 22, 2015
Slide 1 of 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
About This Presentation
This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker.
At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics.
PPT Agenda:
✓ Introduction...
This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker.
At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics.
PPT Agenda:
✓ Introduction to BIG Data & Hadoop
✓ What is MapReduce?
✓ MapReduce Data Flows
✓ MapReduce Programming
----------
What is MapReduce?
MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java.
----------
What are MapReduce Components?
It has the following components:
1. Combiner: The combiner collates all the data from the sample set based on your desired filters. For example, you can collate data based on day, week, month and year. After this, the data is prepared and sent for parallel processing.
2. Job Tracker: This allocates the data across multiple servers.
3. Task Tracker: This executes the program across various servers.
4. Reducer: It will isolate the desired output from across the multiple servers.
----------
Applications of MapReduce
1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Session Objectives Introduction to Big Data and Hadoop Understanding HDFS Introduction to MapReduce – MapReduce Fundamentals MapReduce Programming Tutorial BIG Data Analytics via MapReduce BIG Data & Hadoop Course Details Webinar by Skillspeed Get Started with BIG Data & Hadoop
Big Data and its Challenges Get Started with BIG Data & Hadoop
Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information It’s very difficult to manage such huge data…… Get Started with BIG Data & Hadoop
Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
Hadoop can be used for easy processing of such huge Data….. We will answer how? Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop
Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable Get Started with BIG Data & Hadoop
Why Hadoop? How does Hadoop solve the Big Data challenges? Hadoop Platform is designed to address the big data problems Size of Data Variety of Data Get Started with BIG Data & Hadoop
Hadoop Ecosystem Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apac h e O o zi e ( W o rkfl o w) HDFS (Hadoop Distributed File System) Pig Latin D ata Anal ysis H ive D W Sy stem M a p R educe F r a mewo rk H B ase Ot he r Y A R N F r a me wo rk s ( MPI, G IRA P H ) Y ARN Cl uste r R e sou r c e M a n a g e ment Get Started with BIG Data & Hadoop
Map Reduce Get Started with BIG Data & Hadoop
Map Reduce – Scenario Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop Suppose, you are the handling a project which has x tasks and takes 100 hours for one resource to complete 1 x 100 = 100 hours 100/10(resources) = 10 hours Get Started with BIG Data & Hadoop
Similarly, = 100 hours 100/10 = 10 hours Map Reduce – Scenario Get Started with BIG Data & Hadoop
More Scenarios on Map-Reduce Problem Statement: Find maximum stock market levels recorded in a span of 5 years Problem Statement: De-identify personal identifier information Get Started with BIG Data & Hadoop
Traditional Solution matches Split Data Very Big Data All matches grep grep grep cat grep : matches matches matches Split Data Split Data Split Data Get Started with BIG Data & Hadoop
MapReduce Solution Very Big Input Split Data All matches : Split Data Split Data Split Data M A P R E D U C E MapReduce Framework Get Started with BIG Data & Hadoop
MapReduce Advantages Two biggest advantages: Takes processing to the data Allows processing data in parallel a b c Map Task HDFS Block Data Center Rack Node Get Started with BIG Data & Hadoop
MapReduce Flow Input data is present in data nodes Map tasks = Input Splits Mappers produce intermediate data Data exchanged among nodes in “shuffling” All data of same key goes to same reducer Reducer output stored at output location Node 1 INPUT DATA Map Node 2 Map Node 1 Reduce Node 1 Reduce Get Started with BIG Data & Hadoop
What is Expected? In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview This will help you analyze the importance of the topics under study! Get Started with BIG Data & Hadoop
Job Trends – Hadoop Get Started with BIG Data & Hadoop
Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support 24x7 Get Started with BIG Data & Hadoop
Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion Get Started with BIG Data & Hadoop
Corporate Partners Get Started with BIG Data & Hadoop
Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at [email protected] Contact Us Get Started with BIG Data & Hadoop
Image References Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010 http://imgarcade.com/1/big-data-cartoon/