Conhecendo o Apache HBase

faferreira 96 views 32 slides Sep 30, 2020
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Precisando lidar com dados massivos onde centenas de gigabytes com crescimento para terabytes ou mesmo petabytes fazem parte do seu dia-a-dia ? Você precisa realizar milhares de operações por segundo em múltiplos terabytes de dados ? Venha conhecer o Apache HBase, um banco de dados NoSQL que rod...


Slide Content

Felipe Ferreira

Conhecendo o

Natural Partner for Innovation

[email protected]

•NoSQL datastore built on top of HDFS (Hadoop)
•An Apache Top Level Project
•The goal is the hosting of very large tables (billions of
rows X millions of columns)
•Based on Google’s BigTable paper
What Is HBase?

•Storing large amounts of data (TB/PB)
•High throughput for a large number of requests
•Storing unstructured or variable column data
•Big Data with random read and writes
Why Use HBase?

•Only use with Big Data problems
•Read straight through files
•Write all at once or append new files
–Not random reads or writes
•Access patterns of the data are ill-defined
When to Consider Not Using HBase?

•More complete list at http://wiki.apache.org/hadoop/Hbase/PoweredBy
Hbase in production

HBase Architecture – How It works

•HBase Master
•RegionServer
•ZooKeeper
•HDFS
–NameNode/Standby NameNode
–DataNode
Meet the Daemons

Daemon Locations

Tables and Column Families

Rows and Columns

Regions

Regions

Write Path

Read Path

HBase API – How to access the data

•Data is not accessed over SQL
•You must:
–Create your own connections
–Keep track of the type of data in a column
–Give each row a key
–Access a row by its key
No SQL Means No SQL

•Gets
–Gets a row’s data based on the row key
•Puts
–Update/inserts a row with data based on the row key
•Scans
–Finds all matching rows based on the row key
–Scan logic can be increased by using filters
Types of Access

Gets

Puts

Puts

HBase Schema Design – How to design

•Designing schemas for HBase requires an in-depth knowledge
•Schema Design is ‘data-centric’ not ‘relationship-centric’
•You design around how data is accessed
•Row keys are engineered
No SQL Means No SQL

•A row key is more than the glue between two tables
•Engineering time is spent just on constructing a row key
–Contents of a row key vary by access pattern
–Often made up of several pieces of data
Row Keys

•Schema design does not start in an ERD
•Access pattern must be known and ascertained
•Denormalize to improve performance
–Fewer, bigger tables
Schema Design

HBase in production - examples

•Use of HBase to integrate SMS, chat, email and Facebook Messages into
one inbox

•HydraBase – The evolution of HBase@Facebook

•HBase provides a distributed, read/write backup of all mysql tables in
Twitter's production
•A number of applications including people search rely on HBase internally
for data generation
•Additionally, the operations team uses HBase as a timeseries database for
cluster-wide monitoring/performance data

•Uses HBase as a foundation for cloud scale storage for a variety of
applications
•Uses HBase to build a graph service for global web threat entities
evaluation and reputation

Internal Use Only
Non-profit R&D Center
founded by Nokia in 2001 in Brazil

Focused on projects
delivering solutions and products in the mobile
technology area

Technical team of 200+
Located in Brazil
Manaus | Brasilia | Recife | São Paulo
50+
invention reports
accepted by
Nokia/Microsoft to file
patent application
500+
items of scientific
production

300+
completed projects

Internal Use Only
OUR
CERTIFICATIONS

Internal Use Only
OUR
AWARDS
Eco System Saving Tips (app)
Mobile World Congress 2012
Facelock 1
st
prize
London Hackathon | Nokia World 2010
Audio Aid
1
st
prize |Forum Nokia
Calling All Innovators 2009

Microsoft Data Gathering
Tele.Síntese
2012 & 2013
award

•About training in Big Data (Developer, Analyst, Admin):
http://www.indt.org/servicos/treinamentos/hadoop-developer
http://www.indt.org/servicos/treinamentos/hadoop-analyst
http://www.indt.org/servicos/treinamentos/hadoop-admin

•About Hbase
http://hbase.apache.org/
•About INDT:
http://www.indt.org
[email protected]
•About Hortonworks:
http://www.hortonworks.com
[email protected]




INFOS + CONTACT