Databricks Community Cloud

clairvoyantllc 458 views 32 slides Nov 14, 2016
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

An overview on the Databricks Community Cloud platform offered by Databricks at: https://community.cloud.databricks.com/

Provides step by step instructions on how to create a Spark Standalone Cluster and how to use notebooks.


Slide Content

Databricks Community Cloud By: Robert Sanders

Databricks Community Cloud Free/Paid Standalone Spark Cluster Online Notebook Python R Scala SQL Tutorials and Guides Shareable Notebooks

Why is it useful? Learning about Spark Testing different versions of Spark Rapid Prototyping Data Analysis Saved Code Others …

Forums https:// forums.databricks.com /

Login/Sign Up https:// community.cloud.databricks.com / login.html

Home Page

Active Clusters

Create a Cluster - Steps From the Active Clusters page, click the “+ Create Cluster” button Fill in the cluster name Select the version of Apache Spark Click “Create Cluster” Wait for the Cluster to start up and be in a “Running” state

Create a Cluster

Active Clusters

Active Clusters – Spark Cluster UI - Master

Workspaces

Create a Notebook - Steps Right click within a Workspace and click Create -> Notebook Fill in the Name Select the programming language Select the running cluster you’ve created that you want to attach to the Notebook Click the “Create” button

Create a Notebook

Notebook

Using the Notebook

Using the Notebook – Code Snippets > sc > sc. parallelize ( 1 to 5 ). collect ()

Using the Notebook - Shortcuts Short Cut Action Shift + Enter Run Selected Cell and Move to nex t Cell Ctrl + Enter Run Selected Cell Option + Enter Run Selected Cell and Insert Cell Bellow Ctrl + Alt + P Create Cell Above Current Cell Ctrl + Alt + N Create Cell Bellow Selected Cell

Tables

Create a Table - Steps From the Tables section, click “+ Create Table” Select the Data Source (bellow steps assume you’re using File as the Data Source) Upload a file from your local file system Supported file types: CSV, JSON, Avro, Parquet Click Preview Table Fill in the Table Name Select the File Type and other Options depending on the File Type Change Column Names and Types as desired Click “Create Table”

Create a Table – Upload File

Create a Table – Configure Table

Create a Table – Review Table

Notebook – Access Table

Notebook – Access Table – Code Snippets > sqlContext > sqlContext. sql ( " show tables" ). collect ( ) > v al got = sqlContext. sql ( "select * from got" ) > got. limit ( 10 ) . collect ( )

Notebook – Display

Notebook – Data Cleaning for Charting

Notebook – Plot Options

Notebook – Charting

Notebook – Display and Charting – Code Snippets > filter (got) > val got = sqlContext. sql ( "select * from got" ) > got. limit ( 10 ). collect () > import org.apache.spark.sql.functions . _ > val allegiancesCleanupUDF = udf [String, String ] ( _ . toLowerCase (). replace ( "house " , "" ) ) > val isDeathUDF = udf { deathYear : Integer => if ( deathYear != null ) 1 else } > val gotCleaned = got. filter ( "Allegiances != \"None\"" ). withColumn ( "Allegiances" , allegiancesCleanupUDF ($ "Allegiances" )). withColumn ( " isDeath " , isDeathUDF ($ "Death Year" )) > display( gotCleaned )

Publish Notebook - Steps While in a Notebook, click “Publish” on the top right Click “Publish” on the pop up Copy the link and send it out

Publish Notebook