Big data analytics -hive

karthikakarthi9 3,346 views 44 slides Feb 25, 2016
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

Detailed explanation on hive basics, commands and Hbase


Slide Content

WDABT 2016 – BHARATHIAR UNIVERSITY
1
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

2
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT
2016

component of
3
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT
2016

M-slO-lsip27-7 Structure Data–7szip27-7pMi-
Large Data Set87Ssi9lOi
MapreduceD7s7CCiCp
2Tv-sT0l-TyH Parallel
Distribution1lisAp27-7
Query Data
Why HIVE
4
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Features of hive
5
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

HDFS or HBASE STORAGE SYSTEM
Execution Engine
Hive QL Process Engine 1dop65
WEB UI35bdp
q388cE2p
–5Ed
HIVE
COMMAND
LINE32p5HvTzL-
HD Insight
Meta Store
User
Interface
HIVE Architecture
6
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Embedded Metastore
Local
Metastore Remote Metastore
7
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Hive File formats
•Text Files - Delimited by Parameters
•Sequence Files - Less Data
•RC Files - Analytic Processing
•ORC Files – Optimized file format in binary
format
8
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT
2016

Hive query language offers:
 Create Database
Create ,manage and partition tables
Supports various operators like Relational, Arithmetic and
Logical to evaluate functions
Hive supports DDL and DML
HIVE Query Language (HQL)
9
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT
2016

DDL Data Definition Language)
Statements
The DDL commands are listed below
Create, Alter, Drop database
Create Alter, Drop, Truncate table
Create, Alter with Partitioning and Bucketing
Create Views
Show
Describe
10
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Loading files
Inserting data into Hive Tables from queries
DML (Data Manipulation
Language) Statements
11
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Database Operations
Syntax
CREATE DATABASE IF NOT EXISTS db_name
COMMENT ‘db_name Details’
WITH DBPROPERTIES (‘creator’ = ‘name’);
Example
CREATE DATABASE IF NOT EXISTS LIBDETS
COMMENT ’LIBRARY DETAILS’
WITH DBPROPERTIES (‘creator’ = ‘KIRUTHI’);
12
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Database
OperationsSyntax
SHOW DATABASES // displays databases available
Example
SHOW DATABASES;
Syntax
DESCRIBE DATABASE db_name; //display Schema of database
DESCRIBE DATABASE EXTENDED db_name;

Example
DESCRIBE DATABASE LIBDETS;
DESCRIBE DATABASE EXTENDED LIBDETS
13
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

ALTER Database
Syntax
ALTER DATABASE db_name // Alter database properties
SET DBPROPERTIES (‘edited-by’ = ‘name’);
Example
ALTER DATABASE LIBDETS
SET DBPROPERTIES (‘edited-by’ = ‘KANI’);
14
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

USE , DROP Database
Syntax
USE db_name; //Assign database as current working database
Example
USE LIBDETS;
Syntax
DROP DATABASE db_name; // delete database
Example
DROP DATABASE LIBDETS;
15
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

TABLES
Hive supports two types of tables
Managed Table – Table stored in
HiveWarehouse folder
External Table – Retains a schema copy in
specified location even table is deleted
16
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Creating Managed Table
Syntax
CREATE TABLE IF NOT EXISTS tb_name (column_name
data_type, column_name datatype,column_name data type)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘\t’ ;
Example
CREATE TABLE IF NOT EXISTS LIBTBL ( Member_Code
INT,Membr_Name STRING, Designation STRING,Dept_code
INT,dept_name STRING,group_name STRING,course_name
STRING,title STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘\t’ ;
Managed Table
17
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

External Table.
Creating External Table
Syntax
CREATE EXTERNAL TABLE tb_name IF NOT EXISTS
tb_name (column_name datatype, column_name datatype,
column_name datatype)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘ /home/usr/filename.format’;

Example
CREATE EXTERNAL TABLE IF NOT EXISTS LIBTBL
(Member_Code INT, Member_Name STRING, Designation
STRING, Dept_code INT, course_code INT, dept_name STRING,
group_name STRING, course_name STRING, title STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘/home/livrith/Desktop/Book2.csv’;
18
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Loading Data into Table
Syntax
LOAD DATA LOCAL INPATH
‘hdfs_file_or_directory_path’
OVERWRITE INTO TABLE tb_name;
Example
LOAD DATA LOCAL INPATH
‘/home/kiruthika/Documents/Book2.csv’
OVERWRITE INTO TABLE LIBTBL;
19
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Select clause
Syntax
SELET [ALL | DISTINCT] select_expr, select_expr, . . .
FROM tb_name
[WHERE where_conditon]
[GROUP BY column_name]
[ORDER BY column_name]
[HAVING having_condition]
[DISTRIBUTED column_name]
[LIMIT number];

Example:1
SELECT * FROM LIBTBL;
Example:2
SELECT Member Name, Designation FROM LIBTBL;
20
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Select – where
Example
SELECT * FROM LIBUDET WHERE group_name =
‘TEACHING’
OR group_name = ‘student’
AND Dept_name>= ‘18’;
Select - regular expression
Syntax
SELECT column1,column2,column3 FROM tb_name WHERE
column_name LIKE ‘%alp%’;
Example
SELECT PRODUCT, STATE, CITY FROM SALESDETS
WHERE City LIKE ‘%O%’;
21
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Group by
Example
SELECT PRODUCT, COUNT(PRODUCT)AS C1, STATE,
COUNTRY FROM SALESDETS GROUP BY PRODUCT,
STATE;
Order by // Sorts use only one reducer
Example
SELECT PRODUCT, STATE, PRICE, COUNTRY FROM
SALESDETS
ORDER BY COUNTRY;
22
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Sort by // Sorts the data before given to reducer
Example
SELECT PRODUC,STATE,COUNTRY FROM SALESDETS
SORT BY COUNTRY
LIMIT 10;
Having // Filter data based on Group By
Example
SELECT PRODUCT, COUNT(PRODUCT) AS
C1,STATE,COUNTRY FROM SALESDETS
GROUP BY PRODUCT, STATE, COUNTRY
HAVING C1 > 5;
23
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Limit
Example
SELECT PRODUCT,STATE, PRICE, COUNTRY FROM
SALESDETS COUNTRY LIMIT 10;
Distribute by // distributes rows among reducers
Syntax
SELECT column_name1, column_name2,column_name3 FROM
tb_name DISTRIBUTE BY column_name SORT BY column_name
ASC,column_name ASC LIMIT count;
Example
SELECT PRODUCT,PRICE,STATE FROM SALESDETS
DISTRIBUTE BY STATE
SORT BY STATE ASC, PRODUCT ASC
LIMIT 50;
24
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Cluster by // does the job of both distribute by and sort by
Example
SELECT PRODUCT,PRICE,STATE FROM SALESDETS
CLUSTER BY STATE LIMIT 50;
Difference in Execution of Order By , Sort By, Distribute By, Cluster By
25
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Data Aggregation
COUNT
AVG DISTINCT (AVG)
MIN DISTINCT(MIN)
MAX , DISTINCT(MAX)
26
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Partitions
Hive reads the entire dataset from warehouse even when filter
condition is specified to fetch a particular column. This results as
bottleneck in MapReduce jobs and involves huge degree of I/O.
Partition command is used to break larger dataset into small
chunks on columns.
Hive supports two types of partition
Static partition
Dynamic partition
27
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Creating partition table
Syntax
CREATE TABLE tb_name (column1 datatype, column2
datatype,column3 datatype)
COMMENT ‘Details of the dataset’
PARTITIONED BY (column_name STRING) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ‘,’;
Example
CREATE TABLE MY_TABLE1 (Member_Name STRING,dept_name
STRING,group_name STRING,course_name STRING,title STRING)
COMMENT ‘User information’ PARTITIONED BY (Designation
STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘,’;
28
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Load data into static partition table
Syntax
LOAD DATA LOCAL INPATH ‘file_path’ OVERWRITE
INTO TABLE tb_name;
Example
LOAD DATA LOCAL INPATH
‘/home/livrith/Desktop/mytab.csv’ OVERWRITE INTO
TABLE MY_TABLE2;
29
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Set dynamic partition
The following setting has to be modified to execute
dynamic partitions.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Example
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
30
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Insert data - Dynamic partition table
Syntax
INSERT OVERWRITE TABLE 1
st
_tb_name
PARTITION(column_name) SELECT
column_name1,column_name2,column_name3 FROM
2
nd
_tb_name;
//partition field should be the last attribute when inserting data
Example
INSERT OVERWRITE TABLE MY_TABLE1
PARTITION(Designation)
SELECT Member_Name,dept_name,group_name,
course_name,title,Designation FROM MY_TABLE2;
31
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Bucketing
32
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Bucketing
Bucketing is similar to partitioning.
Bucket is a file.
Bucket are used to create partition on specified column values
where as partitioning is used to divided data into small blocks on
columns.
33
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Table creation
Syntax
CREATE TABLE IF NOT EXISTS tb_name (column1
datatype,column2 datatype,column3 datatype) CLUSTER
BY(column_name) into 3 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY
‘/t’;
Example
CREATE TABLE SALES_BUC1 (Transacyion_date
TIMESTAMP,Product STRING,Price INT,Payment_Type
STRING,Name STRING,City STRING,State STRING,Country
STRING,Account_Created TIMESTAMP) CLUSTERED BY
(Price) into 3 BUCKETS ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’;
34
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Load data into table
Syntax
FROM 1
st
_tb_name INSERT OVERWRITE TABLE
2
nd
_tb_name
SELECT column_name1, column_name2,column_name3;
Example
FROM SALESDETS INSERT OVERWRITE TABLE
SALES_BUC1 SELECT
Transaction_date,Product,Price,Payment_Type,Name,City,Sta
te,Country,Account_Created;
35Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Select from bucket table
Syntax:1
SELECT DISTINCT column_name FROM 2
nd
_tb_name
tb_name (BUCKET 1 OUT OF 3 ON column_name);
Example
SELECT DISTINCT Price FROM SALES_BUC1
TABLESAMPLE (BUCKET 1 OUT OF 3 ON PRICE);
Syntax:2
SELECT DISTINCT column_name FROM tb_name2
Tb_name(BUCKET 1 OUT OF 2 ON column_name);
Example
SELECT DISTINCT PRICE FROM SALES_BUC1
TABLESAMPLE(BUCKET 1 OUT OF 2 ON Price);
36
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Sampling
•SAMPLING is used in hive to populate small dataset from
the existing large datasets. Sampling employs selects records
randomly to create small datasets.
Syntax
SELECT COUNT(*) FROM tb_name TABLESAMPLE
(BUCKET 1 OUT OF 3 ON column_name);
Example
In the example given below sample are created from the table
sales_buc from the available 3 buckets.
SELECT COUNT(*) FROM SALES_BUC TABLESAMPLE
(BUCKET 1 OUT OF 3 ON Price);
37Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

•Apache HBase is an open-source, distributed, versioned,
non-relational database modeled after Google's Bigtable
•Apache HBase provides Bigtable-like capabilities on top
of Hadoop and HDFS.
38
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

NoSQL Databases
•NoSQL – Not only SQL, Non Relational/Non
SQL Databases
•SCHEMA LESS
•Ideology
•BASE – Basically available Eventual
Consistency - Only can support two
availabilty, replication
39
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

NoSQL Types
•Key Value Store - Amazon S3, Riak
•Document based store – CouchDB,MongoDB
•Column based store - Hbase, Cassandra
•Graph based stores - Neoj4, Orientdb
40
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

HBASE is Not
•Table with one primary key (row key)
•No Join Operations
•Limited Atomicty and transaction support
•Manipulated by SQL
41
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Hbase components
•Master - Manages load balancing and scripting
•Regionserver – Range of tables assigned by master
Zookeper –
•Client communicate via Zookeeper for read write
operations in region servers for storing node details
•Region server uses Memstore similar to cache
memory
•Provides services for synchronization, maintenance
42
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Refrences
•http://hadooptutorials.co.in/tutorials
•https://www.youtube.com/watch?v=W_oUrDBLBaE
•https://flume.apache.org/FlumeUserGuide.html
•https://archive.cloudera.com/cdh/3/sqoop/SqoopUser
Guide.html#_basic_usage
•http://hortonworks.com/hadoop/oozie/
•http://www.01.ibm.com/software/data/infosphere/ha
doop/zookeeper/
• https://www.youtube.com/watch?v=Dv2V7lbIRmI
•http://kafka.apache.org/documentation.html
•https://www.youtube.com/watch?v=ArUHr3Czx-8
43
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

44
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016
Tags