Taland Hadoop data integration

huguk 2,495 views 6 slides Oct 23, 2011
Slide 1
Slide 1 of 6
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6

About This Presentation

No description available for this slideshow.


Slide Content

14/10/2011
1
Using Hadoop with Talend

Mark Chapman

Imad Rahman

© Tale nd 2011 2
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers

© Tale nd 2011 3
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers


© Tale nd 2011 4

Venture-backed

Global operations


Corporate Headquarters
San Francisco (Los Altos)
Paris (Suresnes)

Operations
Orange County (Irvine)
Boston (Burlington)
New York (Tarrytown)
London (Maidenhead)
Utrecht
Nuremberg
Bonn
Munich
Milan (Bergame)
Tokyo
Beijing
Talend across the world…
Global leader in open source integration

14/10/2011
2
© Tale nd 2011 5
Customers By Industry
Systems Integrators Public Sector & Education
Retail and Manufacturing
Media & Telco
Finance & Insurance
Software
Serv ices & Others
© Tale nd 2011 6
Market Positioning
Data Quality
Data profiling
Data cleansing
Analytics (ETL)
Operational data integration
Data Integration
Model and master
any data or domain
Master
Data
Management
Application Integration
Connect applications & services
© Tale nd 2011 7
Talend Unified Platform
Deployment
Monitoring
Execution
Repository
Studio
 Complete unified environment supports all integration approaches –
data & application
 Uses consistent technology & leverages open standards
Comprehensive Eclipse-based
user interface
Consolidated metadata & project
information
Web-based deployment & scheduling
Same containers for batch processing,
message routing & services
Single web-based monitoring console
© Tale nd 2011 8
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers

14/10/2011
3
© Tale nd 2011 9
Background: MapReduce and Hadoop
MapReduce: Parallel Programming Model
“Divide and Conquer
Many possible implementations



Hadoop: Open Source Java MapReduce
Simplified framework



Cloud: flexible infrastructure
e.g. Amazon Elastic MapReduce


© Tale nd 2011 10
Talend Integration Suite MPx for Big Data
Right-Time
Batch ETL
High Volume
(ELT)

Big Data
·Hadoop
·Filescale
•One platform
•All sources
•All modes
•All scales
© Tale nd 2011 11
Talend’s Big Data Partnerships

Partnering with Enterprise Big Data Leaders



Cloudera: Enterprise Hadoop
Talend: Open Source Cloudera
Connect Partner for Data Integration


Greenplum: Hadoop-Powered Analytics
Big Data-scale Relational DB
Talend supports Greenplum for
Hadoop and ELT

© Tale nd 2011 12
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers

Agenda

14/10/2011
4
© Tale nd 2011 13
Talend Integration Suite MPx

•Use case: process
structured flat files
(e.g. logs)
•Uses MapReduce
techniques
•Performance optimized for
this use case
•Native code, no Java

•Hadoop components for
easy job design
•HDFS: store, retrieve data
•Cloudera Sqoop: Bulk ETL
•Hive: Relational DB layer
•Pig: In-Hadoop
transformations

Hadoop
Features

Filescale
Features

© Tale nd 2011 14
Talend Components for Hadoop Features
HDFS (Hadoop File System) utilities – for loading/unloading files
Sqoop – utility for RDBMS extract to HDFS (Cloudera only)

Data Warehousing on Hadoop using Hive - SQL - like language, to
query and transform data

Transforming Data in Hadoop using Pig – transform, normalize, clean
HDFS data – very flexible

Talend Integration Suite MPx Hadoop Support
Components for HDFS and Sqoop loading/unloading
Components for defining Pig and Hive jobs
Integrate with any of Talend’s supported sources!

© Tale nd 2011 15
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers

© Tale nd 2011 16
Applying Talend Big Data in Enterprise
Landing data from operational systems
Transforming it before loading DW










Performing additional analytics directly in Hadoop
Keeping historical data online for queries


Hadoop
HDFS
Hive
Sqoop Sqoop
Pig
Hive
DW BI

14/10/2011
5
© Tale nd 2011 17
Today’s Demo Scenario
View sample log data from an online game source
Load log data into Hive
Aggregate the data into 2 aggregate tables
Load aggregated data into RDBMS
Additional processing using PIG
Show Time!
© Tale nd 2011 19
Wrap-up
Talend Integration Suite MPx…

delivers MapReduce technologies as part of a
comprehensive data management solution

makes using Hadoop like other data integration activities

…is available for you to try

Free 2 month license to Talend Integration Suite MPx

Visit http://info.talend.com/hugoffer.html




© Tale nd 2011 20
Agenda
Talend Introduction
MapReduce and Hadoop
Talend Integration Suite MPx
Hadoop Features and TIS Components
How to use Talend to simplify Hadoop
Demo!
Questions & Answers

14/10/2011
6
© Tale nd 2011 21
Questions and Answers
Mark Chapman
Technical Manager
[email protected]
Skype: mchapman68

Imad Rahman
Technical Presales Consultant
[email protected]
Skype: imadrahman.talend

Thank You!