Precisando lidar com dados massivos onde centenas de gigabytes com crescimento para terabytes ou mesmo petabytes fazem parte do seu dia-a-dia ? Você precisa realizar milhares de operações por segundo em múltiplos terabytes de dados ? Venha conhecer o Apache HBase, um banco de dados NoSQL que rod...
Precisando lidar com dados massivos onde centenas de gigabytes com crescimento para terabytes ou mesmo petabytes fazem parte do seu dia-a-dia ? Você precisa realizar milhares de operações por segundo em múltiplos terabytes de dados ? Venha conhecer o Apache HBase, um banco de dados NoSQL que roda em cima do HDFS e é altamente disponível, tolerante a falhas e escalável. HBase tem sido muito utilizado em empresas como Facebook e Twitter. Esta palestra faz uma introdução, mostrando o que é o HBase e quando usar, sua arquitetura e também exemplos de soluções reais de grandes empresas como Facebook, Twitter e Trend Micro
•NoSQL datastore built on top of HDFS (Hadoop)
•An Apache Top Level Project
•The goal is the hosting of very large tables (billions of
rows X millions of columns)
•Based on Google’s BigTable paper
What Is HBase?
•Storing large amounts of data (TB/PB)
•High throughput for a large number of requests
•Storing unstructured or variable column data
•Big Data with random read and writes
Why Use HBase?
•Only use with Big Data problems
•Read straight through files
•Write all at once or append new files
–Not random reads or writes
•Access patterns of the data are ill-defined
When to Consider Not Using HBase?
•More complete list at http://wiki.apache.org/hadoop/Hbase/PoweredBy
Hbase in production
•Data is not accessed over SQL
•You must:
–Create your own connections
–Keep track of the type of data in a column
–Give each row a key
–Access a row by its key
No SQL Means No SQL
•Gets
–Gets a row’s data based on the row key
•Puts
–Update/inserts a row with data based on the row key
•Scans
–Finds all matching rows based on the row key
–Scan logic can be increased by using filters
Types of Access
Gets
Puts
Puts
HBase Schema Design – How to design
•Designing schemas for HBase requires an in-depth knowledge
•Schema Design is ‘data-centric’ not ‘relationship-centric’
•You design around how data is accessed
•Row keys are engineered
No SQL Means No SQL
•A row key is more than the glue between two tables
•Engineering time is spent just on constructing a row key
–Contents of a row key vary by access pattern
–Often made up of several pieces of data
Row Keys
•Schema design does not start in an ERD
•Access pattern must be known and ascertained
•Denormalize to improve performance
–Fewer, bigger tables
Schema Design
HBase in production - examples
•Use of HBase to integrate SMS, chat, email and Facebook Messages into
one inbox
•HydraBase – The evolution of HBase@Facebook
•HBase provides a distributed, read/write backup of all mysql tables in
Twitter's production
•A number of applications including people search rely on HBase internally
for data generation
•Additionally, the operations team uses HBase as a timeseries database for
cluster-wide monitoring/performance data
•Uses HBase as a foundation for cloud scale storage for a variety of
applications
•Uses HBase to build a graph service for global web threat entities
evaluation and reputation
Internal Use Only
Non-profit R&D Center
founded by Nokia in 2001 in Brazil
Focused on projects
delivering solutions and products in the mobile
technology area
Technical team of 200+
Located in Brazil
Manaus | Brasilia | Recife | São Paulo
50+
invention reports
accepted by
Nokia/Microsoft to file
patent application
500+
items of scientific
production
300+
completed projects
Internal Use Only
OUR
CERTIFICATIONS
Internal Use Only
OUR
AWARDS
Eco System Saving Tips (app)
Mobile World Congress 2012
Facelock 1
st
prize
London Hackathon | Nokia World 2010
Audio Aid
1
st
prize |Forum Nokia
Calling All Innovators 2009
Microsoft Data Gathering
Tele.Síntese
2012 & 2013
award
•About training in Big Data (Developer, Analyst, Admin):
http://www.indt.org/servicos/treinamentos/hadoop-developer
http://www.indt.org/servicos/treinamentos/hadoop-analyst
http://www.indt.org/servicos/treinamentos/hadoop-admin