Cassandra 3.0 Data Modeling

planetcassandra 5,003 views 29 slides Jul 20, 2016
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

.


Slide Content

@PatrickMcFadin
Patrick McFadin

Chief Evangelist, DataStax
Cassandra 3.0 Data Modeling
1

A brief history of CQL
You

CQL 3.0 - Cassandra 1.2
•Goodbye CQL 2.0!
•Custom secondary indexes
•Empty IN

CQL 3.1 - Cassandra 2.0
•Aliases
•CREATE <table> IF NOT EXISTS
•INSERT IF NOT EXISTS
•UPDATE IF
•DELETE IF EXISTS
•IN supports cluster columns
LWT

CQL 3.2 - Cassandra 2.1
•User Defined Types
•Collection Indexing
•Indexes can use contains
•Tuples?

User Defined Types
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);

User Defined Types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);

CQL 3.3 - Cassandra 2.2
•Date and Time are now types
•TinyInt and SmallInt
•User Defined Functions
•Aggregates
•User Defined Aggregates

User Defined Functions
CREATE TABLE video_rating (
videoid uuid,
rating_counter counter ,
rating_total counter,
PRIMARY KEY (videoid)
);
CREATE OR REPLACE FUNCTION
avg_rating (rating_counter counter, rating_total counter )
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java AS
'return Double.valueOf(rating_total.doubleValue()/
rating_counter.doubleValue()); ';

User Defined Functions
SELECT avg_rating(rating_counter, rating_total) AS avg_rating
FROM video_rating
WHERE videoid = 99051fe9-6a9c-46c2-b949-38ef78858dd0;

Aggregates
CREATE TABLE video_ratings_by_user (
videoid uuid,
userid uuid,
rating int,
PRIMARY KEY (videoid, userid)
);
SELECT count(userid)
FROM video_ratings_by_user
WHERE videoed = 49f64d40-7d89-4890-b910-dbf923563a33

CQL 3.4 - Cassandra 3.x
•CAST operator
•Per Partition Limit
•Materialized Views
•SASI

Materialized View
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Lookup by this?

Materialized View
CREATE TABLE videos_by_location (
videoid uuid,
userid uuid,
location text,
added_date timestamp,
PRIMARY KEY (location, videoid)
);
Roll your own

Materialized View
CREATE MATERIALIZED VIEW videos_by_location
AS SELECT userid, added_date, videoid, location
FROM videos
WHERE videoId IS NOT NULL AND location IS NOT NULL
PRIMARY KEY(location, videoid);
Cassandra rolls for you

Materialized View Perf

Materialized View Perf
5 Materialized Views vs 5 tables writes async

Materialized View
SELECT location, videoid
FROM videos_by_location ;
location | videoid
-------------------------------------------------+--------------------------------------
http://www.youtube.com/watch?v=px6U2n74q3g | 06049cbb-dfed-421f-b889-5f649a0de1ed
http://www.youtube.com/watch?v=qphhxujn5Es | 873ff430-9c23-4e60-be5f-278ea2bb21bd
/us/vid/0c/0c3f7e87-f6b6-41d2-9668-2b64d117102c | 0c3f7e87-f6b6-41d2-9668-2b64d117102c
/us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | 99051fe9-6a9c-46c2-b949-38ef78858dd0
/us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | b3a76c6b-7c7f-4af6-964f-803a9283c401
http://www.youtube.com/watch?v=HdJlsOZVGwM | 49f64d40-7d89-4890-b910-dbf923563a33
/us/vid/41/416a5ddc-00a5-49ed-adde-d99da9a27c0c | 416a5ddc-00a5-49ed-adde-d99da9a27c0c

SASI
CREATE TABLE users (
userid uuid,
firstname varchar,
lastname varchar,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Lookup by this?

Storage Attached Secondary Index

SASI

SASI
CREATE CUSTOM INDEX ON users (firstname)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};

SASI
CREATE CUSTOM INDEX ON users (lastname)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'CONTAINS'};

SASI
CREATE CUSTOM INDEX ON users (created_date)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'mode': 'SPARSE'};

SASI Indexes
Client
INSERT INTO users(userid,firstname,lastname,email,created_date)

VALUES (9761d3d7-7fbd-4269-9988-6cfd4e188678 ,’Patrick’,’McFadin’,
[email protected] ’,’2015-06-01’);
userid 1
userid 2
Memtable
SSTable
SSTable
SSTable
SASI Index
Node
Data
lastname
lastname
firstname
firstname
email
email
created_date
created_date
SASI Index
SASI Index
Indexer

SASI Queries
SELECT * FROM users WHERE firstname LIKE 'pat%';
SELECT * FROM users WHERE lastname LIKE ‘ %Fad%';
SELECT * FROM users WHERE email LIKE '%data%';
SELECT * FROM users
WHERE created_date > '2011-6-15'
AND created_date < '2011-06-30';
userid | created_date | email | firstname | lastname
--------------------------------------+---------------------------------+----------------------+-----------+----------
9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | [email protected] | Patrick | McFadin

SASI Guidelines
•Multiple fields to search
•No more than 1000 rows returned
•You know the partition key
•Indexing static columns
Use SASI when…

SASI Guidelines
•Searching large partitions
•Tight SLA on reads
•Search for analytics
•Ordering search is important
Don’t Use SASI when…