Hive and querying data

KarthigaGunasekaran1 93 views 10 slides Jul 04, 2019
Slide 1
Slide 1 of 10
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10

About This Presentation

In Hive, tables and databases are created first and then data is loaded into these tables.
Hive as data warehouse designed for managing and querying only structured data that is stored in tables.
While dealing with structured data, Map Reduce doesn't have optimization and usability features like...


Slide Content

HIVE AND QUERYING DATA BY G.KARTHIGA M.SC IT NADAR SARASWATHI COLLEGE OF ARTS AND SCIENCE, THENI.

INTRODUCTION Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive

FEATURES OF HIVE It stores schema in a database and processed data into HDFS. It is designed for OLAP. It provides SQL type language for querying called HiveQL or HQL. It is familiar, fast, scalable, and extensible.

HIVE VS RELATIONAL DATABASES In RDBMS the maximum data size allowed will be in 10s of Terabytes whereas Hive has 100s of Petabytes Hive is very easily scalable at low cost but RDBMS is not scalable at low cost. In RDBMS, record level updates, inserts and deletes, are possible whereas these are not allowed in Hive

ARCHITECTURE OF HIVE 3 major components: • hive clients • hive services • Meta Store Under Hive Client: • Thrift client • ODBC driver • JDBC driver

HIVE APPLICATIONS Log processing Text mining Document indexing Customer-facing business intelligence (e.g., Google Analytics) Predictive modeling, hypothesis testing

HIVE COMPONENTS Shell: allows interactive queries like MySQL shell connected to database – Also supports web and JDBC clients Driver: session handles, fetch, execute Compiler: parse, plan, optimize Execution engine: DAG of stages (M/R, HDFS, or metadata) Metastore: schema, location in HDFS, SerDe

DATA MODEL Tables; 1) Typed columns (int, float, string, date, boolean) 2) Also, list: map (for JSON-like data) Partitions e.g., to range-partition tables by date Buckets Hash partitions within ranges (useful for sampling, join optimization)

HIVE- DATA TYPES All the data types in hive are classified into four types 1 ) Column Types 2) Literal 3) Null Values 4) Complex Types

CONCLUSIONS Supports rapid iteration of ad-hoc queries. Can perform complex joins with minimal code. Scales to handle much more data than many similar systems.
Tags