Serverless Big Data Architecture on Google Cloud Platform at Credit OK
spicydog
1,448 views
31 slides
Dec 02, 2018
Slide 1 of 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
About This Presentation
This is a talk at at Barcamp Bangkhen 2018,
presented by Kriangkrai Chaonithi.
I shared my experience at Credit OK on building a data pipeline to ingest huge amount of customer data to our big data analytic warehouse using serverless services on Google platform.
As a result, we can make it withou...
This is a talk at at Barcamp Bangkhen 2018,
presented by Kriangkrai Chaonithi.
I shared my experience at Credit OK on building a data pipeline to ingest huge amount of customer data to our big data analytic warehouse using serverless services on Google platform.
As a result, we can make it without setting up any servers to handle our data at a very minimal cost.
Size: 2.41 MB
Language: en
Added: Dec 02, 2018
Slides: 31 pages
Slide Content
Serverless Big Data Architecture
on
Google Cloud Platform
at
Presented by Kriangkrai Chaonithi @spicydog
On 25/11/2018, At Barcamp Bangkhen 9
Hello! My name is Gap
Education
●BS Applied Computer Science (KMUTT)
●MS Applied Computer Engineering (KMUTT)
Work Experience
●Former Android, iOS & PHP Developer at Longdo.COM
●Former R&D Manager at Insightera
●CTO & co-founder at Credit OK
Fields of Interests
●Software Engineering
●Computer Security
●Servers & Cloud & Distributed Computing
●Machine Learning & NLP
https://spicydog.me
Agenda
●Server & application deployment history
●Introduction to Google Cloud Platform products
○Computing
○Storage & databases
○Data analytics
●Big data architecture at Credit OK
○About Credit OK
○Why we use serverless
○Our requirements
○Our solutions
○The summary
Server & Application
Deployment History
Bare Metal Server
●Pre-cloud era (probably..)
●Install OS and dependencies on a machine
●One machine - one server
●Expose the network to the internet
●Colocation/on-premise
●SSH/FTP/Git to the server
Virtualization
●One machine - many servers
●One machine multiple customers
●VPS / Cloud
●SSH/FTP/Git to the server
IaaS
Containers & Micro Services
●Docker / Kubernetes
●Auto deployment
●Auto scale (automatic spawn new nodes)
●Pay base on number of nodes
●Infrastructure as code! (new concept!)
GCP Data Analytics
Pipeline Analytics Visualization
Credit Scoring Platform on Big Data Analytics
creditok.co
Why use serverless on big data?
●Scalable & super high performance
●No more server maintenance :)
●Easier to optimize
●Only pay per use
Requirements
●Have a HUGE data warehouse for batch processing
●Our customer have on-premise data on >400 sites
●Data ingestor app is needed to install to every site
●Data ingestor app must be able to run on
●Data ingestor app must be super robust and easy to install
●Must work automatically everyday, task scheduler
When >400 sites upload large files
to your server at the same time..
This is unintentional DDoS!
So we mainly use cloud function
●Auto scale
●But only accept <10 MB body size
and also use
Compute/App Engine
for >10MB files
Raw Data
Source
Raw Data
Source
Data Flow Architecture
Serverless
Big Data Architecture
In Summary
●Focus on design & coding
●Few people to achieve huge task
●No cost on idle server, pay as you use
(GCS storage ~$0.02 per GB)
●Processing cost is surprisingly low when optimized
(Beware of BigQuery cost!)
Beware of ZONE_RESOURCE_POOL_EXHAUSTED
●Serverless doesn’t mean no server, you just do not need to spawn servers/workers
●Worker pools have limit, do not run your app at the peak time (but when!!)
●Hopefully Google will solve the problem soon :)