SQOOP PPT

mayankpamdey 3,315 views 22 slides Oct 23, 2018
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

WHAT IS SQOOP? WHY WE ARE USING SQOOP?
SQOOPS TYPES AND ARCHITECTURES?
WHAT IS REST API WITH EXAMPLE
FEATURES AND LIMITATION
FIVE STAGE IMPORTING DATA


Slide Content

YEAR-2018-19
SUBMMITTED TO
Er. SHRIYA MAM
Asst. Prof.
SUBMMITTED BY
DUSHHYANT KUMAR
ROLL NUMBER -06
1
BIGDATA ANALYSIS
DUSHHYANT KUMAR

1.What is Sqoop?
2.Why we use Sqoop?
3.Sqoop Architecture
4.What Is REST API
5.Difference Between Sqoop1 & Sqoop 2
6.Features of Sqoop
7.Five stage of Sqoop Import Overview
8.Importing & Exporting Data using Sqoop
9.Sqoop Limitations
INDEX
2
BIGDATA ANALYSIS
DUSHHYANT KUMAR

What is Sqoop?
▪Apache Sqoop is a tool inHadoop ecosystemwhich is designed to transfer data
betweenHDFS(Hadoop storage) and relational database servers like MySQL, Oracle
RDB, SQLite, Teradata, Netezza, Postgres etc.
▪It efficiently transfers bulk data between Hadoop and external data stores such as
enterprise data warehouses, relational databases, etc.
▪This is how Sqoop got its name –“SQL to Hadoop& Hadoop to SQL”.
▪Sqoop transfer data between hadoop and relational DB servers.
▪Sqoop is used to import data from relational DB such as MySQL, Oracle.
▪Sqoop is used to export data from HDFS to relational DB.
3
BIGDATA ANALYSIS
DUSHHYANT KUMAR

✓Big data developer’s works start once the data are in Hadoop system like in HDFS,
Hive or Hbase. They do their magical stuff to find all the golden information hidden on
such a huge amount of data.
✓Before Sqoop came, developers used to write to import and export databetween
Hadoop and RDBMS and a tool was needed to the same.
✓Again, Sqoop uses the MapReduce mechanism for its operations like import and
export work and work on a parallel mechanism as well as fault tolerance.
✓In Sqoop, developers just need to mention the source, destination and the rest of the
work will be done by the Sqoop tool.
✓Sqoop came and filled the gap between the transfer between relational databases
and Hadoop system.
Why we use Sqoop?
4
BIGDATA ANALYSIS
DUSHHYANT KUMAR

Here is one source which is RDBMS likeMySQLand other is a destination like Hbase or HDFS
etc. and Sqoopperforms the operation to perform import and export.
Sqoop Architecture
5
BIGDATA ANALYSIS
RDBMS
(MYSQL,ORACLE,)
HADOOP FILE
SYSTEM
(HDFS, Hbase
,Hive)
SQOOP TOOL
IMPORT
EXPORT
DUSHHYANT KUMAR

6
Difference Between Sqoop1 & Sqoop 2
S.N. Sqoop1 Sqoop2
1 Client--‐only Architecture Client/Server Architecture
2 CLI based CLI + Web based
3 Client access to Hive,HBase Server access to Hive,HBase
4 Oozie and Sqoop tightly coupled Oozie finds REST API
DATA
WAREHOUSE
RELATIONAL
DATABASE
DOCUMENT
BASED
SYSTEM
HADOOP
MAP TASK
HDFS/HBase
/Hive
SQOOP
User
COMMAND
REST
UI
Job
Manager
SQOOP
SERVER
Connector
Manager
Connectors
Metadata
Metadata
Repository
User
Browser
Sqoop Client
DATA
WAREH
OUSE
RELATI
ONAL
DATAB
ASE
DOCUME
NT BASED
SYSTEM
HADOOP
MAP TASK
HDFS/HBase/
Hive
CLI
BIGDATA ANALYSIS
DUSHHYANT KUMAR

Sqoop1 Architecture
7
DATA
WAREHOUSE
RELATIONAL
DATABASE
DOCUMENT
BASED
SYSTEM
MAP TASK
HDFS/HBase
SQOOP
User
COMMAND
When Sqoop starts functioning, only mapper job will run and reducer is not required. Here is a
detailed view of Sqoop architecturewith mapper-
1.Sqoop provides command line
interface to the end users and
can also be accessed using Java
API.
2.Here only Map phase will run
and reduce is not required
because the complete import
and export process doesn’t
require any aggregation and so
there is no need of reducers in
Sqoop.
BIGDATA ANALYSIS

Sqoop 2 Architecture
8
CLI
REST
UI
JobManager
SQOOP SERVER
ConnectorManager
Connectors
Metadata
Metadata
Repository
SOURCES
DATA
WAREHOUSE
RELATIONAL
DATABASE
DOCUMENT
BASED
SYSTEM
HADOOP
MAP TASK
HDFS/HBase/Hive
User
Browser
Sqoop Client
REDUCE TASK
BIGDATA ANALYSIS
DUSHHYANT KUMAR

9
What Is REST API
REST:-(RepresentationalState Transfer application programming interface).
Rest is Architectural Style(Rest Written in Scala,ajav,php etc.).
Its used Http Protocol.
HTTP METHODS:-
1.GET
2.POST
3.PUT
4.DELETE
This document will explain you how to use Sqoop Network API to allow external applications
interacting with Sqoop server.
The REST API is a lower level API than the Sqoop client API, which gives you the freedom to
execute commands in Sqoop server with any tools or programming language.
The REST API is leveraged via HTTP requests and use JSON format to encode data content.
BIGDATA ANALYSIS BIGDATA ANALYSIS
DUSHHYANT KUMAR

10
What Is Client-Server REST API
BIGDATA ANALYSIS
DUSHHYANT KUMAR

11
What Is REST API
BIGDATA ANALYSIS
DUSHHYANT KUMAR

12
What Is REST API
BIGDATA ANALYSIS
DUSHHYANT KUMAR

13
REST API Example
BIGDATA ANALYSIS
DUSHHYANT KUMAR

14
REST API Example
BIGDATA ANALYSIS
DUSHHYANT KUMAR

1.FullLoad:ApacheSqoopcanloadthewholetablebyasinglecommand.Youcanalsoloadallthetables
fromadatabaseusingasinglecommand.
2.IncrementalLoad:ApacheSqoopalsoprovidesthefacilityofincrementalloadwhereyoucanloadpartsof
tablewheneveritisupdated.Sqoopimportsupportstwotypesofincrementalimports:1.Append2.Lastmodified.
3.Parallelimport/export:SqoopusesYARNframeworktoimportandexportthedata,whichprovidesfault
toleranceontopofparallelism.
4.ImportresultsofSQLquery:YoucanalsoimporttheresultreturnedfromanSQLqueryinHDFS.
5.Compression:Youcancompressyourdatabyusingdeflate(gzip)algorithmwith–compressargument,orby
specifying–compression-codecargument.YoucanalsoloadcompressedtableinApacheHive.
6.ConnectorsforallmajorRDBMS Databases:ApacheSqoopprovidesconnectorsformultipleRDBMS
databases,coveringalmosttheentirecircumference.
7.LoaddatadirectlyintoHIVE/HBase :YoucanloaddatadirectlyintoApacheHiveforanalysisandalso
dumpyourdatainHBase,whichisaNoSQLdatabase.
Features of Sqoop
15
BIGDATA ANALYSIS
$ Sqoop import--connect --table --username --password --incremental --check-column --last-value
DUSHHYANT KUMAR

Five stage Sqoop Import Overview
16
DATA STRUCTURE
MYSQL
SQL
SERVER
ORACLE DB2
RDBMS
Map
Sqoop
Map
Sqoop
Map
Sqoop
Map
Sqoop
DATA SINKS
HDFS Hive HBase DATA
SINK
DATA
SINK
DATA
SINK
DATA
SINK
MAP REDUCE
SQOOP
CLIENT
Run Import
Pull Metadata
Launch
MapReduce
Job
Pull Data
From
Database
Write To
Data Sink
BIGDATA ANALYSIS
DUSHHYANT KUMAR

ORDERS
RDBMS,ORACLE
HDFS
Map
Map
Map
Map
File
File
File
File
Importing Data using Sqoop
Sqoop Import
Sqoop Job
HADOOP CLUSTER
1.Gather Metadata
2.Submit Map Only
Request
User
17
BIGDATA ANALYSIS
DUSHHYANT KUMAR

Sqoop Import Data
18
BIGDATA ANALYSIS
▪SQOOP Import
▪Import individual tables from RDBMS to HDFS.
▪Each row in a table is treated as records in HDFS.
▪All record are stored as text data in text files or binary files.
▪Generic Syntax:
▪Importing a Table into HDFS Syntax:
▪Takes JDBC urland connects to database
--table - Source table name to be imported.
--username - Username to connect to database.
--password - Password of the connecting user.
--target-dir - Imports data to the specified directory.
DUSHHYANT KUMAR

Exporting Data using Sqoop
19
BIGDATA ANALYSIS
ORDERS
RDBMS,ORACLE
HDFS
Map
Map
Map
Map
File
File
File
File
Sqoop Export
Sqoop Job
HADOOP CLUSTER
1.Gather Metadata
2.Submit Map Only
Request
User
DUSHHYANT KUMAR

Sqoop Export Data
20
BIGDATA ANALYSIS
▪SQOOP Export
▪Export a set of files from HDFS back to RDBMS.
▪Files given an input to SQOOP contains records called as rows in table.
▪Generic Syntax:
▪Exporting a Table into RDBMS Syntax:
▪Takes JDBC urland connects to database
--table - Source table name to be exported.
--username - Username to connect to database.
--password - Password of the connecting user.
--target-dir - Imports data to the specified directory.
DUSHHYANT KUMAR

Limitations of Sqoop
1.Sqoop cannot be paused and resumed. It is an atomic step. If it is failed we need to clear
things up and start again.
2.Sqoop Export performance also depends upon the hardware configuration (Memory, Hard disk)
of RDBMS server.
3.Sqoop is slow because it still uses Map Reduce in backend processing.
4.Failures need special handling in case of partial import or export.
5.For few databases Sqoop provides bulk connector which has faster performance. It uses a
JDBC connection to connect with RDBMS based on data stores, and this can be inefficient and
less performance.
21
BIGDATA ANALYSIS
DUSHHYANT KUMAR

22
BIGDATA ANALYSIS
DUSHHYANT KUMAR
Tags