Apache phoenix

Apache Phoenix Put the SQL back in NoSQL 1 Osama Hussein, March 2021

Agenda History Overview Architecture 2 Capabilities Code Scenarios

1. History From open-source repo to top Apache project

Overview (Apache Phoenix) 4 Began as an internal project by the company (salesforce.com). MAY 2014 JAN 2014 A Top-Level Apache Project Orignially Open-S o urced on Github

2. Overview UDF, Transactions and Schema

Overview (Apache Phoenix) 6 Lorem ipsum congue tempus Support for late-bound, schema-on-read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive Developed as part of Apache Hadoop. Runs on top of Hadoop Distributed File System (HDFS). HBase scales linearly and shards automatically.

Overview (Apache Phoenix) 7 Lorem ipsum congue tempus Support for late-bound, schema-on-read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive Apache Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. implements best-practice optimizations to enable software engineers to develop next-generation data-driven applications based on HBase. Create and interact with tables in the form of typical DDL/DML statements using the standard JDBC API.

Overview (Apache Phoenix) 8 Written in Java and SQL Atomicity, Consistency, Isolation and Durability (ACID) Fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.

Overview (Apache Phoenix) 9 included in Cloudera Data Platform 7.0 and above. Hortonworks distribution for HDP 2.1 and above. Available as part of Cloudera labs. Part of the Hadoop ecosystem.

Overview (SQL Support) 10 Compiles SQL to and orchestrate running of HBase scans. Produces JDBC result set. All standard SQL query constructs are supported.

Overview (SQL Support) 11 Direct use of the HBase API, along with coprocessors and custom filters. Performance: Milliseconds for small queries Seconds for tens of millions of rows.

Overview (Bulk Loading) 12 MapReduce-based : CSV and JSON Via Phoenix MapReduce library Single-Threaded: CSV Via PostgreSQL (PSQL) HBase on local machine

Overview (User Defintion Functions) 13 Temporary UDFs for sessions only. Permanent UDFs stored in system functions. UDF used in SQL and indexes. Tenant specific UDF usage and support. UDF jar update require cluster bounce.

Overview (Transactions) 14 Using Apache Tephra cross row/table/ACID support. Create tables with flag ‘ transactional = true ’. Enable transactions and snapshot directory and set timeout value ‘hbase-site.xml’. Transactions start with statement against table. Transactions end with commit or rollback.

Overview (Transactions) 15 Applications let HBase manage timestamps. Incase the application needs to control the timestamp ‘ CurrentSCN ’ property must be specified at the connection time. ‘ CurrentSCN ’ controls the timestamp for any DDL, DML, or query.

Overview (Schema) 16 The table metadata is stored in versioned HBase table (Up to 1000 versions). ‘ UPDATE_CACHE_FREQUENCY ’ allow the user to declare how often the server will be checked for meta data updates. Values: Always Never Millisecond value

Overview (Schema) 17 Phoenix table can be: Built from scratch. Mapped to an existing HBase table. Read-Write Table Read-Only View

Overview (Schema) 18 Read-Write Table: column families will be created automatically if they don’t already exist. An empty key value will be added to the first column family of each existing row to minimize the size of the projection for queries.

Overview (Schema) 19 Read-Only View: All column families must already exist. Addition of the Phoenix coprocessors used for query processing (Only change to HBase table).

3. Architecture Architecture, Phoenix Data Mode, Query Execution and Enviroment

Architecture 21

Architecture 22

Architecture (Phoenix Data Model) 23

Architecture (Server Metrics Example) 24

Architecture (Server Metrics Example) 25 Example:

26 Overlay Row Key Query Perform Merge Sort Skip Filtering Scan Interception Execute Scan Perform Final Merge Sort Intercept Scan in Coprocessor Filter using Skip Scan Execute Parallel Scans Overlay Row Key Ranges with Regions Identify Row Key Ranges from Query Architecture (Query Execution)

Architecture (Enviroment) 27 Data Warehouse Extract, Transform, Load (ETL) BI and Visualizing

4. Code C o mmands and Sample Codes

Code (C o mmands) 29 DML Commands: UPSERT VALUES UPSERT SELECT DELETE DDL Commands: CREATE TABLE CREATE VIEW Drop Table Drop View

30 Connection: Long Running Short Running Connection conn = DriverManager.getConnection (“ jdbc:phoenix:my_server:longRunning ”, longRunningProps ); Connection conn = DriverManager.getConnection (" jdbc:phoenix:my_server:shortRunning ", shortRunningProps );

31 @Test public void createTable () throws Exception { String tableName = generateUniqueName (); long numSaltBuckets = 6; String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets ; Connection conn = DriverManager.getConnection ( getUrl ()); conn.createStatement ().execute( ddl ); } Transactions: Create Table

32 @Test public void readTable () throws Exception { String tableName = generateUniqueName (); long numSaltBuckets = 6; long numRows = 1000; long numExpectedTasks = numSaltBuckets ; insertRowsInTable ( tableName , numRows ); String query = "SELECT * FROM " + tableName ; Statement stmt = conn.createStatement (); ResultSet rs = stmt.executeQuery (query); PhoenixResultSet resultSetBeingTested = rs.unwrap ( PhoenixResultSet.class ); changeInternalStateForTesting ( resultSetBeingTested ); while ( resultSetBeingTested.next ()) {} resultSetBeingTested.close (); Set<String> expectedTableNames = Sets.newHashSet ( tableName ); assertReadMetricValuesForSelectSql ( Lists.newArrayList ( numRows ), Lists.newArrayList ( numExpectedTasks ), resultSetBeingTested , expectedTableNames ); } Transactions: Read Table

33 @Override public void getRowCount ( ResultSet resultSet ) throws SQLException { Tuple row = resultSet.unwrap ( PhoenixResultSet.class ). getCurrentRow (); Cell kv = row.getValue (0); ImmutableBytesWritable tmpPtr = new ImmutableBytesWritable ( kv.getValueArray (), kv.getValueOffset (), kv.getValueLength ()); // A single Cell will be returned with the count(*) - we decode that here rowCount = PLong.INSTANCE.getCodec (). decodeLong ( tmpPtr , SortOrder.getDefault ()); } Transactions: Row Count

34 private void changeInternalStateForTesting ( PhoenixResultSet rs ) { // get and set the internal state for testing purposes. ReadMetricQueue testMetricsQueue = new TestReadMetricsQueue ( LogLevel.OFF,true ); StatementContext ctx = ( StatementContext ) Whitebox.getInternalState ( rs , "context"); Whitebox.setInternalState ( ctx , " readMetricsQueue ", testMetricsQueue ); Whitebox.setInternalState ( rs , " readMetricsQueue ", testMetricsQueue ); } Transactions: Internal State

5. Capabilities Features and Capabilities

Capabilities Overlays on top of HBase Data Model Keeps Versioned Schema Respository Query Processor 36

Capabilities Cost-based query optimizer. Enhance existing statistics collection. Generate histograms to drive query optimization decisions and join ordering. 37

Capabilities Secondary indexes: Boost the speed of queries without relying on specific row-key designs. Enable users to use star schemes. Leverage SQL tools and Online Analytics Processing (OLAP) tools. 38

Capabilities Row timestamp column. Set minimum and maximum time range for scans. Improves performance especially when querying the tail-end of the data. 39

5. Scenarios Use Cases

Scenarios (Server Metrics Example) 41

SELECT substr (host,1,3), trunc ( date,’DAY ’), avg( response_time ) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr (host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr (host, 1, 3), trunc ( date,’DAY ’) 42 Scenarios ( Chart Response Time Per Cluster )

SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr (host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5 43 Scenarios ( Find 5 Longest GC Times )

Thanks! Any questions? You can find me at: Github: @sxaxmz Linkedin: linkedin.com/in/husseinosama 44

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Apache phoenix

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......