Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA

sean_seannery 7,723 views 74 slides Nov 18, 2015
Slide 1
Slide 1 of 74
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74

About This Presentation

This is a talk that was given for the Scalable Internet Services Masters-level Computer Science class at UCLA and UCSB. It briefly discusses the server architecture for the game League of Legends before going into depth about how the data warehouse can hold petabytes of player data. Discussion abo...


Slide Content

MOVING
MOUNTAINS OF
PLAYER DATA
SEAN MALONEY
RIOT GAMES
@SEAN_SEANNERY
SCALABLE INTERNET SERVICES
UCLA/UCSB - NOV 2015

SEAN MALONEY
BIG DATA ENGINEER
WHO IS THIS GUY?
Lead developer on Riot’s ETL tools


FUN FACT:
Was a student in this class 4 years
ago
Intern at Appfolio

MOVING MOUNTAINS OF DATA
INTRODUCTION1.
THE GAME PLATFORM: OUR MAIN DATA SOURCE2.
HOW WE INGEST AND QUERY DATA3.
HOW WE SCALE IN AWS4.
CONCLUSION - SEAN’S PRO TIPS5.

INTRODUCTION

WHAT IS LEAGUE OF LEGENDS?
2009
LAUNCH
ONLINE
MULTIPLAYER
WINDOWS
/ OSX
40-50 MIN
GAMES

THE
TEAM
YOUR CHAMP
THE
BATTLE
GROUND

THE GAME PLATFORM

THE CLIENT.

CHAT
STOREAUDIT
Load Balancers and Firewalls

CHAT
ORACLE COHERENCE (IN MEMORY DB)
STORE AUDIT GAME ETC.
CHAT
CHAT
STORE AUDIT GAME ETC.
STORE AUDIT GAME ETC.
PRIMARY DB
HOT BACKUP DB
2nd BACKUP DB
/ ETL

OTHER DATA SOURCES
<REST>

DATA INGESTION

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE QUERY / VIEWSVIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

Distributed ETL Software written in
Ruby.
Scales Horizontally
Same ETL applied to multiple regions
/ datacenters
Self-Service UI with SQL query
templating.

NA Korea Russia

Create an ETL

Create an ETL

Amazon S3
SQS
(S)FTP
Hive
Microsoft SQL Server
MySQL
DynamoDB
Vertica
Redshift
REST websites


FUETL
CAN
CONNECT
TO

Create an ETL

Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO

Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO

Webapp
Core Libraries
Task Service
Tasks
Task DAO
Helper Service
Helpers
Helper DAO
Environment
Service
Environment
DAO
Scheduler Process Worker Process Task / Helper / Controllers
Env. Task DAO Env. Helper DAO
Command Line Tool
View
- backbone.js
- Bootstrap CSS

Webapp
Core Libraries
Task Service
Tasks
Helper Service
Helpers
Environment
Service
Scheduler Process Worker Process Task / Helper / ControllersCommand Line Tool
View
- backbone.js
- Bootstrap CSS
Task DAO Helper DAOEnvironment
DAO
Env. Task DAO Env. Helper DAO

FuETL STATISTICS
14 TB
DATA MOVED DAILY
5213
ACTIVE REGIONAL
ETLS
23125
DAILY ETL RUNS

FuETL SCALING

FuETL SCALING

Idempotency
Idempotent - an operation that will produce the
same results if executed once or multiple times

EXAMPLE:
Non-Idempotent: - x = x * 5;
- Submitting a purchase

Idempotent: - abs( abs(x) ) = abs(X)
- Cancelling a purchase

Idempotent?
In the transactional OLTP world….

INSERT INTO games_played
(SELECT * FROM games_played_na
WHERE date >= ‘2015-10-25’)

Idempotent?
In the big data / OLAP world….

INSERT INTO games_played
(SELECT * FROM games_played_na
WHERE date >= ‘2015-10-25’)

KEEPING INTEGRITY
X

Message Queues
ETL2ETL3ETL4ETL5. . .ETLN
ETL1
X
X
SCHEDULER
aka
PRODUCER
WORKER
aka
CONSUMER

Message Queues
●REDUNDANCY
●DELIVERY GUARANTEE
●SCALABILITY
●ASYCH. COMMUNICATION
●ABSTRACTION / DECOUPLING

Message Queues
●AMAZON SIMPLE QUEUE SERVICE
●APACHE ACTIVEMQ
●RABBITMQ
●HORNETQ
●MICROSOFT MQ (MSMQ)

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

Self Service, Custom HTTP Edge
Service (Java)
0
Fronted by ELB in front of ~40
autoscaled m1.xlarge instances

Forwards JSON data indirectly to S3
Honu
The batches need to then be unpacked
and converted into Hive tables
0

Custom Collector Infrastructure
(Java) - Derived from Netflix Suro
0
Deployed in every data center
worldwide and also AWS

Self Service, Custom HTTP Edge
Service (Java API)
Honu

Honu =

Custom HTTP Edge Service (Java)
0
DRADIS
Fronted by ELB in front of ~40 m1.
xlarge instances

Forwards data indirectly to S3 via
Honu Collectors

Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
R
E
S
T

E
N
D
P
O
I
N
T
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON

Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
R
E
S
T

E
N
D
P
O
I
N
T
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
JSONJSONJSONJSONJSONJSON
batchid = 20150512

Honu
JSONJSONJSONJSONJSONJSON
COLLECTORS
R
E
S
T

E
N
D
P
O
I
N
T
JSONJSONJSONJSONJSONJSON
GAM1GAM1GAM1
GAM
X
GAM1GAM1
JSONJSONJSONJSONJSONJSON

Idempotency
Use application logic to make idempotent

msg = queue.pop;
if (processed_games.contains( msg.game_id )
{
return; //do nothing
else {
process_game(msg);
}

What’s in there?
Data team doesn’t know everything that is submitted


Compliance
Are we violating international data laws?

Inconsistent data structure
Its formatted however developer submits it

THE
DOWN
SIDE

User Documentation
No one likes doing it, but it helps a lot.

Onboard training
Get new coworkers in-the-know

Familiar Protocols
Use REST or RPC so developers are on the same page
Focus on UX
Your tools need to be easy for non-technical people to use.

SELF
SERVICE
HOW?

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

AMAZON S3
s3n://datawarehouse/
schema1/
table1/
env/
dt/
time/
table2/
table3/
schema2/

s3n://telemetrydata/
application1/
table1/
env/
dt/
table2/
application2/
AMAZON S3 STRUCTURE
HIVE
‣schema1
table1
env
dt
time
table2
table3
‣schema2
table1
...
‣schema3
‣schema4

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

REST micro-service built with Java
and docker.
Reports and visualizations we can
use to find problems.
Source and target comparison.
Warehouse
Auditing
Service
Platform

HOW TO AUDIT

VISUALIZING

VISUALIZING

HOW TO AUDIT

PUSH-BASED
PULL-BASED / ETL
BATCH QUERIES
INGESTION STORAGE
QUERY /
VIEWS
VIZ. TOOLS
SINGLE-ROW QUERIES
AGGREGATE QUERIES
FuETL
- OLTP game data
- External Data Sources
MASTER WAREHOUSE
HONU
- Anything pushed to it
- Server logs
DATA AUDITING

BATCHOLAPPOINT

SCALING IN AWS

RESOURCE CONTENTION

SCALING

RDS
AWS Infrastructure Today
EMR EC2 Storage
Data
Science
Analytics /
Hue
ETL Telemetry
PlatforaDynamoDB
Loading
Auditing ETL
Telemetry
collectors
Data
dictionary
Rocana
(real time
dashboard)
Solr (real
time)
Point Data
Service
Metastore
Data
Science
Fraud
DYNAMODB
ETL App DB







Point Data
Store

S3
Source of “Truth”

Networking
VPC
AWS Direct
Connect
AWS Direct
Connect
AWS Direct
Connect
AWS Direct
Connect

CONCLUSION

DON’T
SEAN’S PRO TIPS OF THE DAY
DO
➔Don’t wait. Create S3
permissions and naming
standards early
➔Get an auditing solution
for DW accuracy
➔Allocate time for tuning
AWS infrastructure
➔Don’t forget to track cost.
AWS bills can surprise you
➔Don’t underestimate simple
problems in big data.

➔Prepare for multiple data
access patterns
➔Keep idempotency in mind
and use MQ architecture
➔Don’t stop. Believing

Custom rewards for mastering
different champions
Intensive query that spans every
game that every player has played
Improves player engagement

CHAMPION
MASTERY

Full copy of our data warehouse in
DynamoDB
Hive->DynamoDB Dynamic Partition


Support can answer questions faster
than ever.
PLAYER
SUPPORT

Data science team queries all chat
messages in game
Sentiment analysis and
classification



Identifies negative, offensive players
and mutes them automatically.

OFFENSIVE
CHAT
DETECTION

QUESTIONS?
SMALONEY
@RIOTGAMES.COM

@SEAN_SEANNERY engineering.riotgames.com
ENGINEERING
BLOG