In this session you will learn:
HBase Introduction
Row & Column storage
Characteristics of a huge DB
What is HBase?
HBase Data-Model
HBase vs RDBMS
HBase architecture
HBase in operation
Loading Data into HBase
HBase shell commands
HBase operations through Java
HBase operations through MR
To kno...
In this session you will learn:
HBase Introduction
Row & Column storage
Characteristics of a huge DB
What is HBase?
HBase Data-Model
HBase vs RDBMS
HBase architecture
HBase in operation
Loading Data into HBase
HBase shell commands
HBase operations through Java
HBase operations through MR
To know more, click here: https://www.mindsmapped.com/courses/big-data-hadoop/big-data-and-hadoop-training-for-beginners/
Size: 672.02 KB
Language: en
Added: Aug 01, 2018
Slides: 24 pages
Slide Content
Big Data and Hadoop Training HBASE
Agenda HBase Introduction Row & Column storage Characteristics of a huge DB What is HBase ? HBase Data-Model HBase vs RDBMS HBase architecture HBase in operation Loading Data into HBase HBase shell commands HBase operations through Java HBase operations through MR
What is Hbase ? Open source project built on top of Apache Hadoop NoSQL database Distributed , scalable store Column-family datastore
How do you pick Sql or NoSql ? What does your data look like? Is your data model likely to change? Is your data growing exponentially? Will you be doing real-time analytics on operational data?
Inspiration for Hbase Google’s BigTable is the inspiration for Hbase It is designed to run on a cluster of computers. Characteristics of Big Table: Data is ‘Sparse’ Data is stored as a ‘Sorted Map’ ‘Distributed’ ‘Multi-dimensional’ ‘Consistent’
Hbase vs RDBMS HBase RDBMS Data that is accessed together is stored together Data is normalized Column-oriented Row-oriented(mostly) Flexible schema, can add columns on the fly Fixed schema Good with Sparse tables Not optimized for sparse tables No Joins Optimized for joins Horizontal Scalability Har d to shard and scale Good for structured, semi-structured data Good for structured data Row-based transactions Distributed transactions
Row & Column - Storage Column oriented store – For specific queries, not all values of a table are needed (analytical databases) Advantages of Column-oriented storage: Reduced I/O Values of columns in the logical rows are similar – better suited for compression
Hbase Data - Model Component Description Table Data organized into tables; comprised rows Row key Data stored in rows; Rows identified by Rowkeys ; Primary key; Rows are sorted by this value Column family Columns are grouped into families Column Qualifier Identifies the column Cell Combination of the rowkey , column family, colum , timestamp; contains the value Version Values within cell versioned by version number timestamp
Hbase Data Model
Hbase Data - Model Regions – Horizontal partitions of a Hbase Table. A Region is denoted by the Table it belongs to, it’s first row(inclusive), last row(exclusive) Regions are the units that get distributed over an entire cluster. Initially, a table comprises a single region, but as the region grows it eventually crosses a configurable size threshold, at which point it splits at a row boundary into two new regions of approximately equal size
Hbase Architecture
Hbase Master – master node Regionservers – slave nodes Hbase Master bootstraps a virgin install, assigns regions to registered regionservers , recovers regionserver failures Regionservers carry zero or more regions take client read/write requests Manage region splits – informs master about the new daughter regions Hbase Architecture
ZooKeeper – Authority on the cluster state Hbase – location of catalog table & cluster master Assignment of regions is mediated via Zookeeper in case servers crash mid-assignment Hbase Client must know the location of the zookeeper ensemble. Thereafter, client navigates the zookeeper hierarchy to learn cluster attributes such as server lcoations . Hbase Architecture
h base:meta – list, state & locations of all regions on the cluster. Entries in hbase:meta are keyed by region name Region name – table name of the region, region’s start row, time of creation, and MD5 hash of all of these. Eg : TestTable,xyz,1279729913622.1b6e176fb8d8aa88fd4ab6bc80247ece. As row keys are sorted, finding the region that hosts a particular key is easy Whenever region(s) split, enabled, disabled, deleted etc., the catalog table is updated. Hbase in Operation
Fresh clients connect to Zookeeper cluster to get the location of hbase:meta To figure out hosting user-space regions and its locations. Then, clients interact directly with regionservers . Clients cache their previous operations – works fine until there is a fault. If fault happens, clients contact hbase:meta again. If this has also moved, clients will contact Zookeeper. Writes arriving at a regionserver are first appended to a commit log and then added to an in-memory memstore . When a memstore fills, its content is flushed to the filesystem Hbase in Operation
When reading, the region’s memstore is consulted first. If sufficient versions are found reading memstore alone, the query completes there. Otherwise, flush files are consulted in order, from newest to oldest, either until versions sufficient to satisfy the query are found or until we run out of flush files. Hbase in Operation
Using HBase shell Using Client APIs Using Pig Using Sqoop Loading Data Into Hbase
Hbase Shell commands
Hbase Shell commands
Hbase Shell Commands
Connect to Hbase from Clients
Hbase Use cases Capturing incremental data – Time series data – High Volume, Velocity Writes eg : Sensor, system metrics, events, stock prices, server logs, rainfall data Information Exchange – High Volume, Velocity Write/Read eg : email, chat Content serving, web Application Backend – High Volume, Velocity Reads eg : ebay , groupon