Building machine learning applications locally with Spark — Joel Pinho Lucas (Tailtarget) @PAPIs Connect — São Paulo 2017

papisdotio 367 views 15 slides Jun 26, 2017

Slide 1 of 15

About This Presentation

In times of huge amounts of heterogeneous data available, processing and extracting knowledge requires more and more efforts on building complex software architectures. In this context, Apache Spark provides a powerful and efficient approach for large-scale data processing. This talk will briefly in...

Size: 1.05 MB

Language: en

Added: Jun 26, 2017

Slides: 15 pages

Slide Content

Building Machine Learning
applications locally with Spark
21/06/2017
Joel Pinho Lucas

Agenda
•Problems and Motivation
•Spark and MLlib overview
•Launching applications in a Spark cluster
•Simulating a Spark cluster using Docker
•Demo: deploying a Spark cluster in a local machine
•Unit tests for Spark jobs
2

3
•
How to setup a Spark cluster (infra + conﬁguration)?
•
Test and/or Debug a Spark job
•
All team should have the same environment

4
•Lightweight cluster
•One machine
•Same environment for all team
•Deployed easily in any platform
Run Spark Locally with docker

5
•
Easy to develop (API in Java, Scala, Python, R)
•
High Quality algorithms
http://spark.apache.org/mllib/
•
Fast to run
•
Lazy evaluation
•
In memory Storage

6
http://spark.apache.org/docs/2.1.0/cluster-overview.html
Spark Execution Model

Cluster Types
•Standalone
•Apache Mesos
•Hadoop Yarn
7

8
Starting a Cluster Manually
Manually Submitting an Application

Choose your Docker Image
(or build your own and share)
9

Some available Spark Docker
Images
10
•https://github.com/big-data-europe/docker-spark
•https://hub.docker.com/r/internavenue/centos-spark/
•https://github.com/sequenceiq/docker-spark
•https://github.com/epahomov/docker-spark
•https://www.anchormen.nl/spark-docker/
•https://github.com/gettyimages/docker-spark
•https://hub.docker.com/r/bigdatauniversity/spark/

http://github.com/joelplucas/docker-spark
11

Example to Run
•MLlib's FP-Growth algorithm
•Data from the digital publishing domain
•Problem: to ﬁnd frequent patterns from navigation proﬁles
•Write results in MongoDB
http://github.com/joelplucas/fpgrowth-spark-example
12

The Dataset
13

Unit Testing using Spark Testing Base
•
Launched in Strata NYC 2015 by Holden Karau (and maintained by the community)
•
Supports unit tests in Java, Scala and Python
14

Q&A - Contact
‣
Linkedin: http://br.linkedin.com/in/joelplucas/
‣
Email: [email protected]
15

Building machine learning applications locally with Spark — Joel Pinho Lucas (Tailtarget) @PAPIs Connect — São Paulo 2017

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Building machine learning applications locally with Spark — Joel Pinho Lucas (Tailtarget) @PAPIs Connect — São Paulo 2017

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx