Hbase

AmitkumarPal21 106 views 23 slides Oct 17, 2018
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

In this we add the Hbase architecture and Hbase data model etc.


Slide Content

HBase HBase Overview Data Models Row Oriented vs Column oriented Amit Pal Moiz Patvi

HBase HBase is a distributed column-oriented data store built on top of HDFS. HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing . HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java much like a typical  Apache™ MapReduce  application.

HBase Part Of Hadoop Ecosystem

HBase Architecture

HBase Components MasterServer : Assigns regions to the region servers and takes the help of Apache ZooKeeper for this task. Handles load balancing of the regions across region servers. It unloads the busy servers and shifts the regions to less occupied servers. HBase HMaster performs DDL operations (create and delete tables) and assigns regions to the Region servers as you can see in the above image.

It assigns regions to the Region Servers on startup and re-assigns regions to Region Servers during recovery and load balancing. It provides an interface for creating, deleting and updating tables.

Region  Server : Many regions are assigned to a  Region Server , which is responsible for handling, managing, executing reads and writes operations on that set of regions. Communicate with the client and handle data-related operations. Region: A region contains all the rows between the start key and the end key assigned to that region. HBase tables can be divided into a number of regions in such a way that all the columns of a column family is stored in one region. Each region contains the rows in a sorted order.

The store contains memory store and HFiles . Memstore is just like a cache memory. Anything that is entered into the HBase is stored here initially. Later, the data is transferred and saved in Hfiles as blocks and the memstore is flushed.

Zookeeper: Zookeeper is an open-source project that provides services like maintaining configuration information, naming, providing distributed synchronization, etc. Clients communicate with region servers via zookeeper. Every Region Server along with HMaster Server sends continuous heartbeat at regular interval to Zookeeper and it checks which server is alive and available as mentioned in above image. It also provides server failure notifications so that, recovery measures can be executed.

 there is an inactive server, which acts as a backup for active server. If the active server fails, it comes for the rescue. The active HMaster sends heartbeats to the Zookeeper while the inactive HMaster listens for the notification send by active HMaster. If the active HMaster fails to send a heartbeat the session is deleted and the inactive HMaster becomes active.

Limitations of Hadoop Hadoop can perform only batch processing, and data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs. A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access).

Hadoop Random Access Database Applications such as HBase, Cassandra, Dynamo, and MongoDB are some of the databases that store huge amounts of data and access the data in a random manner.

What Is HBase? HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System (HDFS). It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System. One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access.

HBase Data Model Data stored in HBase is located by its " rowkey “. Rowkey is like a primary key from a relational database. Records in HBase are stored in sorted order, according to rowkey . Data in a row are grouped together as Column Families. Each Column Family has one or more Columns . These Columns in a family are stored together in a low level storage file known as Hfile .

Data Model (Region) Tables are divided into sequence of rows, by key range, called regions. These regions are then assigned to the data node in the cluster called “ Regionserver ”.

Data Model (column family) A column is identified by an column qualifier that consist of the column family name concatenated With the column name using a colon. Eg - Personaldata:name . Column family are mapped to storage files and are stored in separate file, which can also be accessed separately

Data Model (cell) Data is stored in HBASE tables Cells. Cell is a combination of row, column family .column qualifier and contains a value and timestamp. The key consists of the row key, column family name, column name, and timestamp. The entire cell, with the added structural information, is called Key Value.

Column Oriented and Row O riented Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. Shortly, they will have column families. Row-Oriented Database Column-Oriented Database It is suitable for Online Transaction Process (OLTP). It is suitable for Online Analytical Processing (OLAP). Such databases are designed for small number of rows and columns. Column-oriented databases are designed for huge tables. The following image shows column families in a column-oriented database:

ID NAME SALARY 101 ABC 100 102 PQR 200 Normal Row Representation ID 101 102 NAME ABC PQR SALARY 100 200 Normal Column Representation HBase Representation ROW COLUMN+CELL COLUMNFAMILY NAME TIMESTAMP VALUE 101 COLUMN=CF:NAME T1 VALUE=ABC 101 COLUMN=CF:SALARY T2 VALUE=100 102 COLUMN=CF:NAME T3 VALUE=PQR 102 COLUMN=CF:SALARY T4 VALUE=200 CF=name of the column family

HBase Shell Commands To start the shell at command line: $ hbase shell Getting the status of the system: hbase ( main):002:0> status Creating the table: hbase (main):005:0> create 'emp', 'personal details’, 'professional details’ Describe the table: hbase (main):017:0> describe 'emp’ List the tables present in keyspace : hbase (main):001:0> list Inserting data into the table: hbase (main):018:0> put 'emp', 1, personal details:name , Ram

Viewing records inserted in the table: hbase (main):023:0> scan 'emp’ Getting the record from Hbase table: hbase (main):026:0> get 'emp', ‘1’ Getting a specific column from the record: hbase (main):002:0> get 'emp', '1. {COLUMN => 'personal details.name} Dropping a table. We can drop a table by first disabling it and then executing the dropped table: hbase (main):016:0> disable 'emp’ hbase (main):017:0> drop 'emp’

Application of HBase It is used whenever there is a need to write heavy applications. HBase is used whenever we need to provide fast random access to available data. Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.

Thank You
Tags