Hbase

AmitkumarPal21 106 views 23 slides Oct 17, 2018

Slide 1 of 23

About This Presentation

In this we add the Hbase architecture and Hbase data model etc.

Size: 1.15 MB

Language: en

Added: Oct 17, 2018

Slides: 23 pages

Slide Content

HBase HBase Overview Data Models Row Oriented vs Column oriented Amit Pal Moiz Patvi

HBase HBase is a distributed column-oriented data store built on top of HDFS. HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing . HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java much like a typical Apache™ MapReduce application.

HBase Part Of Hadoop Ecosystem

HBase Architecture

HBase Components MasterServer : Assigns regions to the region servers and takes the help of Apache ZooKeeper for this task. Handles load balancing of the regions across region servers. It unloads the busy servers and shifts the regions to less occupied servers. HBase HMaster performs DDL operations (create and delete tables) and assigns regions to the Region servers as you can see in the above image.

It assigns regions to the Region Servers on startup and re-assigns regions to Region Servers during recovery and load balancing. It provides an interface for creating, deleting and updating tables.

Region Server : Many regions are assigned to a Region Server , which is responsible for handling, managing, executing reads and writes operations on that set of regions. Communicate with the client and handle data-related operations. Region: A region contains all the rows between the start key and the end key assigned to that region. HBase tables can be divided into a number of regions in such a way that all the columns of a column family is stored in one region. Each region contains the rows in a sorted order.

The store contains memory store and HFiles . Memstore is just like a cache memory. Anything that is entered into the HBase is stored here initially. Later, the data is transferred and saved in Hfiles as blocks and the memstore is flushed.

Zookeeper: Zookeeper is an open-source project that provides services like maintaining configuration information, naming, providing distributed synchronization, etc. Clients communicate with region servers via zookeeper. Every Region Server along with HMaster Server sends continuous heartbeat at regular interval to Zookeeper and it checks which server is alive and available as mentioned in above image. It also provides server failure notifications so that, recovery measures can be executed.

there is an inactive server, which acts as a backup for active server. If the active server fails, it comes for the rescue. The active HMaster sends heartbeats to the Zookeeper while the inactive HMaster listens for the notification send by active HMaster. If the active HMaster fails to send a heartbeat the session is deleted and the inactive HMaster becomes active.

Limitations of Hadoop Hadoop can perform only batch processing, and data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs. A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access).

Hadoop Random Access Database Applications such as HBase, Cassandra, Dynamo, and MongoDB are some of the databases that store huge amounts of data and access the data in a random manner.

What Is HBase? HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable. HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System (HDFS). It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System. One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access.

HBase Data Model Data stored in HBase is located by its " rowkey “. Rowkey is like a primary key from a relational database. Records in HBase are stored in sorted order, according to rowkey . Data in a row are grouped together as Column Families. Each Column Family has one or more Columns . These Columns in a family are stored together in a low level storage file known as Hfile .

Data Model (Region) Tables are divided into sequence of rows, by key range, called regions. These regions are then assigned to the data node in the cluster called “ Regionserver ”.

Data Model (column family) A column is identified by an column qualifier that consist of the column family name concatenated With the column name using a colon. Eg - Personaldata:name . Column family are mapped to storage files and are stored in separate file, which can also be accessed separately

Data Model (cell) Data is stored in HBASE tables Cells. Cell is a combination of row, column family .column qualifier and contains a value and timestamp. The key consists of the row key, column family name, column name, and timestamp. The entire cell, with the added structural information, is called Key Value.

Column Oriented and Row O riented Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. Shortly, they will have column families. Row-Oriented Database Column-Oriented Database It is suitable for Online Transaction Process (OLTP). It is suitable for Online Analytical Processing (OLAP). Such databases are designed for small number of rows and columns. Column-oriented databases are designed for huge tables. The following image shows column families in a column-oriented database:

ID NAME SALARY 101 ABC 100 102 PQR 200 Normal Row Representation ID 101 102 NAME ABC PQR SALARY 100 200 Normal Column Representation HBase Representation ROW COLUMN+CELL COLUMNFAMILY NAME TIMESTAMP VALUE 101 COLUMN=CF:NAME T1 VALUE=ABC 101 COLUMN=CF:SALARY T2 VALUE=100 102 COLUMN=CF:NAME T3 VALUE=PQR 102 COLUMN=CF:SALARY T4 VALUE=200 CF=name of the column family

HBase Shell Commands To start the shell at command line: $ hbase shell Getting the status of the system: hbase ( main):002:0> status Creating the table: hbase (main):005:0> create 'emp', 'personal details’, 'professional details’ Describe the table: hbase (main):017:0> describe 'emp’ List the tables present in keyspace : hbase (main):001:0> list Inserting data into the table: hbase (main):018:0> put 'emp', 1, personal details:name , Ram

Viewing records inserted in the table: hbase (main):023:0> scan 'emp’ Getting the record from Hbase table: hbase (main):026:0> get 'emp', ‘1’ Getting a specific column from the record: hbase (main):002:0> get 'emp', '1. {COLUMN => 'personal details.name} Dropping a table. We can drop a table by first disabling it and then executing the dropped table: hbase (main):016:0> disable 'emp’ hbase (main):017:0> drop 'emp’

Application of HBase It is used whenever there is a need to write heavy applications. HBase is used whenever we need to provide fast random access to available data. Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.

Hbase

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Hbase

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......