Serverless Big Data Architecture on Google Cloud Platform at Credit OK

spicydog 1,448 views 31 slides Dec 02, 2018
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

This is a talk at at Barcamp Bangkhen 2018,
presented by Kriangkrai Chaonithi.
I shared my experience at Credit OK on building a data pipeline to ingest huge amount of customer data to our big data analytic warehouse using serverless services on Google platform.
As a result, we can make it withou...


Slide Content

Serverless Big Data Architecture
on
Google Cloud Platform
at

Presented by Kriangkrai Chaonithi @spicydog
On 25/11/2018, At Barcamp Bangkhen 9

Hello! My name is Gap

Education
●BS Applied Computer Science (KMUTT)
●MS Applied Computer Engineering (KMUTT)
Work Experience
●Former Android, iOS & PHP Developer at Longdo.COM
●Former R&D Manager at Insightera
●CTO & co-founder at Credit OK
Fields of Interests
●Software Engineering
●Computer Security
●Servers & Cloud & Distributed Computing
●Machine Learning & NLP
https://spicydog.me

Agenda

●Server & application deployment history
●Introduction to Google Cloud Platform products
○Computing
○Storage & databases
○Data analytics
●Big data architecture at Credit OK
○About Credit OK
○Why we use serverless
○Our requirements
○Our solutions
○The summary

Server & Application
Deployment History

Bare Metal Server

●Pre-cloud era (probably..)
●Install OS and dependencies on a machine
●One machine - one server
●Expose the network to the internet
●Colocation/on-premise
●SSH/FTP/Git to the server

Virtualization

●One machine - many servers
●One machine multiple customers
●VPS / Cloud
●SSH/FTP/Git to the server

IaaS

Containers & Micro Services

●Docker / Kubernetes
●Auto deployment
●Auto scale (automatic spawn new nodes)
●Pay base on number of nodes
●Infrastructure as code! (new concept!)

PaaS

Why Containers?

Why Container Orchestration?
https://blog.risingstack.com/what-is-kubernetes-how-to-get-started/

Serverless

●Write code and deploy!
●Auto deploy
●Auto scale
●Pay per request
●No infrastructure!!

SaaS

It’s time to talk about..

Some Famous Features on GCP

GCP Computing
Virtual Machine


Containers


Severless

Let’s Review Types of Databases
SQL NoSQL

CAP Theorem

GCP Storages & Databases
Non-serverless
Serverless

GCP Data Analytics
Pipeline Analytics Visualization

Credit Scoring Platform on Big Data Analytics

creditok.co

Why use serverless on big data?
●Scalable & super high performance
●No more server maintenance :)
●Easier to optimize
●Only pay per use

Requirements
●Have a HUGE data warehouse for batch processing
●Our customer have on-premise data on >400 sites
●Data ingestor app is needed to install to every site
●Data ingestor app must be able to run on
●Data ingestor app must be super robust and easy to install
●Must work automatically everyday, task scheduler

When >400 sites upload large files
to your server at the same time..

This is unintentional DDoS!

So we mainly use cloud function

●Auto scale
●But only accept <10 MB body size

and also use
Compute/App Engine
for >10MB files

Raw Data
Source
Raw Data
Source
Data Flow Architecture

Serverless
Big Data Architecture
In Summary
●Focus on design & coding
●Few people to achieve huge task
●No cost on idle server, pay as you use
(GCS storage ~$0.02 per GB)
●Processing cost is surprisingly low when optimized
(Beware of BigQuery cost!)

Beware of ZONE_RESOURCE_POOL_EXHAUSTED
●Serverless doesn’t mean no server, you just do not need to spawn servers/workers
●Worker pools have limit, do not run your app at the peak time (but when!!)
●Hopefully Google will solve the problem soon :)

We Are Hiring!

●PHP Laravel/Lumen Developer
●Data Engineer
●Credit Risk Analyst

[email protected]
https://jobs.blognone.com/company/creditok

Qu?s??o? & An??er

Time is short, let’s utilize the networks.
Feel free to connect with me via spicydog.me