KNIME For Data Analytics Course Overview

BakhtiarAmaludin 48 views 176 slides Sep 30, 2024
Slide 1
Slide 1 of 176
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176

About This Presentation

KNIME For Data Analytics Course Overview


Slide Content

[L1-DS] KNIME Analytics Platform
for Data Scientists: Basics
KNIME AG
1

Overview
KNIME Analytics Platform

© 2021 KNIME AG. All rights reserved.
What is KNIME Analytics Platform?
§A toolfordataanalysis, manipulation, visualization, andreporting
§Basedon thegraphicalprogrammingparadigm
§Providesa diverse arrayofextensions:
§Text Mining
§Network Mining
§Cheminformatics
§Manyintegrations,
such asJava, R, Python,
Weka, Keras, Plotly, H2O, etc.
3

© 2021 KNIME AG. All rights reserved.
Visual KNIME Workflows
NODES perform tasks on data
Nodes are combined to create
WORKFLOWS
Status
InputsOutputs
Not Configured
Configured
Executed
Error
4

© 2021 KNIME AG. All rights reserved.
Data Access
§Databases
§MySQL, PostgreSQL, Oracle
§Theobald
§any JDBC (DB2, MS SQL Server)
§Amazon DynamoDB
§Files
§CSV, txt, Excel, Word, PDF
§SAS, SPSS
§XML, JSON, PMML
§Images, texts, networks
§Other
§Twitter, Google
§Amazon S3, Azure Blob Store
§Sharepoint, Salesforce
§Kafka
§REST, Web services
5

© 2021 KNIME AG. All rights reserved.
Big Data
§Spark & Databricks
§HDFS support
§Hive
§Impala
§In-database processing
6

© 2021 KNIME AG. All rights reserved.
Transformation
§Preprocessing
§Row, column, matrix based
§Data blending
§Join, concatenate, append
§Aggregation
§Grouping, pivoting, binning
§Feature Creation and Selection
7

© 2021 KNIME AG. All rights reserved.
Analysis & Data Mining
§Regression
§Linear, logistic
§Classification
§Decision tree, ensembles, SVM, MLP,
Naïve Bayes
§Clustering
§k-means, DBSCAN, hierarchical
§Validation
§Cross-validation, scoring, ROC
§Deep Learning
§Keras, DL4J
§External
§R, Python, Weka, H2O, Keras
8

© 2021 KNIME AG. All rights reserved.
Visualization
§Interactive Visualizations
§JavaScript-based nodes
§Scatter Plot, Box Plot, Line Plot
§Networks, ROC Curve, Decision Tree
§PlotlyIntegration
§Addingmore with each release!
§Misc
§Tag cloud, open street map, molecules
§Script-based visualizations
§R, Python
9

© 2021 KNIME AG. All rights reserved.
Deployment
§Database
§Files
§Excel, CSV, txt
§XML
§PMML
§to: local, KNIME Server, Amazon S3, Azure
Blob Store
§BIRT Reporting
10

© 2021 KNIME AG. All rights reserved.
Analysis & MiningStatistics
Data MiningMachine LearningWeb AnalyticsText MiningNetwork AnalysisSocial Media AnalysisR, Weka, PythonCommunity / 3rd
Data Access
MySQL, Oracle, ...
SAS, SPSS, ...
Excel, Flat, ...
Hive, Impala, ...
XML, JSON, PMML
Text, Doc, Image, ...
Web Crawlers
Industry Specific
Community / 3rd
Transformation
Row
Column
Matrix
Text, Image
Time Series
Java
Python
Community / 3rd
Visualization
RJFreeChart
JavaScriptPlotly
Community / 3rd
Deployment
via BIRT
PMML
XML, JSON
Databases
Excel, Flat, etc.
Text, Doc, Image
Industry Specific
Community / 3rd
Over 2000 Native and Embedded Nodes Included:
11

© 2021 KNIME AG. All rights reserved.
Install KNIME Analytics Platform
§Select the KNIME version for your computer:
§Mac
§Windows –32 or 64 bit
§Linux
§Download the archive and extract
the file, or download installer
package and run it
12

© 2021 KNIME AG. All rights reserved.
Start KNIME Analytics Platform
§Use the shortcut created by the installer
§Or go to the installation directory and launch KNIME via the knime.exe
13

© 2021 KNIME AG. All rights reserved.
The KNIME Workspace
§The workspaceisthefolder/directoryin whichworkflows(andpotentiallydata
files) arestoredforthecurrentKNIME session.
§Workspacesareportable (just like KNIME)
14

© 2021 KNIME AG. All rights reserved.
KNIME Explorer
Workflow Coach
NodeRepository
Workflow Editor
Outline
Console& NodeMonitor
NodeDescription
15
The KNIME Analytics PlatformWorkbench
KNIME Hub

© 2021 KNIME AG. All rights reserved.
KNIME Explorer
§In LOCAL you can access your own workflow projects.
§Other mountpoints allow you to connect to
§EXAMPLE Server
§KNIME Hub
§KNIME Server
§The Explorer toolbar on the top has a search box and buttons to
§select the workflow displayed in the active editor
§refresh the view
§The KNIME Explorer can contain 4 types of content:
§Workflows
§Workflow groups
§Data files
§Shared Components
16

© 2021 KNIME AG. All rights reserved.
Creating New Workflows, Importing and Exporting
§Right-click inside the KNIME Explorer to create a new workflow or a workflow
group, or to import a workflow
§Right-click the workflow or workflow group to export
17

© 2021 KNIME AG. All rights reserved.
Node Repository
§The Node Repository lists all
KNIME nodes
§The search box has 2 modes
§Standard Search –exact match of node
name
§Fuzzy Search –finds the most similar node
name
18

© 2021 KNIME AG. All rights reserved.
Description
§The Description window gives
information about:
§Node Functionality
§Input & Output
§Node Settings
§Ports
§References to literature
19

© 2021 KNIME AG. All rights reserved.
Workflow Description
§When selecting the workflow, the
Description window gives
information about the workflow’s:
§Title
§Description
§Associated Tags and Links
§Creation Date
§Author
20

© 2021 KNIME AG. All rights reserved.
Workflow Coach
§Node recommendation engine
§Gives hints about which node use next in the workflow
§Based on KNIME communities' usage statistics
§Based on own KNIME workflows
21

© 2021 KNIME AG. All rights reserved.
Node Monitor
§By defaultthe Node Monitor shows you the output table of the node selected in
the workflow editor
§Click on the three dots on the upper right to show the flow variables,
configuration, etc.
22

© 2021 KNIME AG. All rights reserved.
Console and Other Views
§Console view prints out error and
warning messages about what is
going on under the hood
§Click on View and select Other… to
add different views
§Node Monitor, Licenses, etc.
23

© 2021 KNIME AG. All rights reserved.
Inserting and Connecting Nodes
§Insert nodesintoworkspacebydraggingthemfromNodeRepository orby
double-clickingin NodeRepository
§Connect nodesbyleft-clickingoutputportofNodeA anddraggingthecursorto
(matching) inputportofNodeB
§Common porttypes:
Data
Image
DB ConnectionDB Data
Model
Flow Variable
24

© 2021 KNIME AG. All rights reserved.
More on Nodes…
§A node can have 4 states:
Not Configured:
The node is waiting for configuration or incoming data.
Configured:
The node has been configured correctly and can be executed.
Executed:
The node has been successfully executed. Results
may be viewed and used in downstream nodes.
25
Error:
The node has encountered an error during execution.

© 2021 KNIME AG. All rights reserved.
Node Configuration
§Most nodesrequireconfiguration
§Toaccessa nodeconfigurationwindow:
§Double-clickthenode
§Right-click -> Configure
26

© 2021 KNIME AG. All rights reserved.
Node Execution
§Right-clicknode
§Select Execute in the context menu
§Ifexecutionissuccessful, statusshows
greenlight
§Ifexecutionencounterserrors, status
showsredlight
27

© 2021 KNIME AG. All rights reserved.
Tool Bar
The buttons in the toolbar can be used for the active workflow. The most
important buttons are:
§Execute selected and executable nodes (F7)
§Execute all executable nodes
§Execute selected nodes and open first view
§Cancel all selected, running nodes (F9)
§Cancel all running nodes
28

© 2021 KNIME AG. All rights reserved.
Node Views
§Right-click node to inspect the execution results by
§selecting output ports (last option in the context menu)
to inspect tables, images, etc.
§selecting Interactive View to open visualization results
in a browser
29
Plot View Data View

© 2021 KNIME AG. All rights reserved.
KNIME File Extensions
Dedicated file extensions for workflows and workflow groups associated
with KNIME Analytics Platform
§*.knwffor KNIME Workflow Files
§*.knar for KNIME Archive Files
30

© 2021 KNIME AG. All rights reserved.
Getting Started: KNIME Hub
§Place to search and share
§Workflows
§Nodes
§Components
§Extensions
31
https://hub.knime.com

© 2021 KNIME AG. All rights reserved.
Getting Started: KNIME Example Server
§Connect via KNIME Explorer to a public repository with large selection of
example workflows for many, many applications
32

© 2021 KNIME AG. All rights reserved.
Hot Keys (for Future Reference)
33
TaskHot keyDescription
Node ConfigurationF6openstheconfigurationwindowoftheselectednode
Node Execution
F7executes selected configured nodes
Shift + F7executes all configured nodes
Shift + F10executes all configured nodes and opens all views
F9cancels selected running nodes
Shift+ F9cancels all running nodes
Node ConnectionsCtrl + Lconnects selected nodes
Ctrl + Shift + Ldisconnects selected nodes
Move Nodes and
Annotations
Ctrl + Shift+ Arrowmoves the selected node in the arrow direction
Ctrl + Shift + PgUp/PgDownmoves the selected annotation in the front or in the back of all
overlapping annotations
WorkflowOperations
F8resets selected nodes
Ctrl + Ssaves the workflow
Ctrl+ Shift + Ssaves all open workflows
Ctrl + Shift + Wcloses all open workflows
MetanodeShift + F12opens metanode wizard

Today‘s Example

© 2021 KNIME AG. All rights reserved.
Today’sExample: ChurnPrediction
§Build a data science application step by step
§Each section of the course has an associated workflow with exercises
§The exercises complete the steps in the CRISP-DM cycle
35

© 2021 KNIME AG. All rights reserved.
The Data
§The data files used in the exercises are available in the “data”
folder: data files in different file formats, web-based data, data
on a database, etc.
§For churn prediction, customer data are blended
from different sources
§The Data Explorer node is helpful in inspecting data
36

© 2021 KNIME AG. All rights reserved.
Today’sExample: ChurnPrediction
37

Importing Data
AccessingFiles andDatabases

© 2021 KNIME AG. All rights reserved.
Data Source Nodes
Typicallycharacterizedby:
§Orange color
§Bydefaultnoinputports, 1-2 outputports
§New filehandlingwithKNIME 4.3.
§Consistent user experience across all nodes and file systems
§Managing of various file systems within the same workflow
§Performance improvements
Status
Nodelabel
Output port
39

© 2021 KNIME AG. All rights reserved.
§Reads either one or multiple
.csv and .txt files
§Further tabs to
§limit the rows
§select encoding
CSV Reader
40
Preview
Basic settings
Advancedsettings
File path
Help button
File system

© 2021 KNIME AG. All rights reserved.
Common Settings: File Path
§A path consists of three parts:
§Type: Specifies the file system type e.g.local, relative, mountpoint, custome_urlor connected.
§Specifier: Optional string with additional file system specific information e.g.relative to which location
(knime.workflow)
§Path: Specifies the location within the file system
§Examples:
§(LOCAL, , C:\Users\username\Desktop)
§(RELATIVE, knime.workflow, file1.csv)
§(MOUNTPOINT, MOUNTPOINT_NAME, /path/to/file1.csv)
§(CONNECTED, amazon-s3:eu-west-1, /mybucket/file1.csv)
41
Specifier
Path
Type

© 2021 KNIME AG. All rights reserved.
Common Settings: Four Default File Systems
§Local File System
§Relative to …
§Mountpoint
§Custom URL
42

© 2021 KNIME AG. All rights reserved.
Common Settings: Connecting to other File Systems
§Add file system connection port to connect to another file system
§Click on the three dots on the lower left to
add or remove a dynamic port.
§Supported file systems
§Microsoft Azure
§Google
§Amazon
§Databricks
§BigDatafile systems (hdfs, httpFS, …)
§On-premise (e.g.ssh, ftp, …)
43

© 2021 KNIME AG. All rights reserved.
Common Settings: Read Single or Multiple Files
§Single file
§Files in a folder
§Option to include subfolder
§Option to define filter criterions
44

© 2021 KNIME AG. All rights reserved.
§Supported operations
§Column filtering
§Column sorting
§Column renaming
§Column type mapping
§Select between union or intersection of
columns (in case of reading many files)
Common Settings: Transformation Tab
45

© 2021 KNIME AG. All rights reserved.
Alternative Faster Way …
Drag & Drop
OR
Copy &
Paste
46

© 2021 KNIME AG. All rights reserved.
File Path Options Old File Handling
§Local path
§Absolute URL
§Mountpoint-relative URL
47
New file handling

© 2021 KNIME AG. All rights reserved.
Workflow-Relative File Paths (Old File Handling)
§Best choice if workflows are tobe
shared
§Requires matching folderstructure
withinworkflowgroup
§Independent ofenvironmentoutside of
workflowgroup
§Example: Path to„Sentiment Analysis.table“
§Localpath:
§C:\Users\rb\knime-workspace\KNIMEUserTraining\data\Sentiment Analysis.table
§Workflow relative:
YouTube KNIME TV Channel:
https://youtu.be/U9sP4g4yGwY
48

© 2021 KNIME AG. All rights reserved.
Excel Reader (XLS)
§Reads .xlsand.xlsxfilefromMicrosoft Excel
§Supports readingfrommultiple sheets
49

© 2021 KNIME AG. All rights reserved.
Excel Reader
50
Preview
Sheet
specific
settings
File pathFile system

© 2021 KNIME AG. All rights reserved.
Table Reader
§Reads tablesfromthenative KNIME Format
§Maximum performance, minimumconfiguration
File path
51
File system

© 2021 KNIME AG. All rights reserved.
Database Connectivity
§Read datafromanyJDBC enableddatabase
§Write yourownSQL ormodelitusingdedicatednodes
52

© 2021 KNIME AG. All rights reserved.
Database Connectors
§Native: Postgres, MySQL, MS SQL Server, SQLite
§DB Connector (e.g. DB2, HANA).
§Big Data: HIVE andImpala
53

© 2021 KNIME AG. All rights reserved.
Exercise Session 1:
§Download the course material from the KNIME Hub
https://hub.knime.com/knime/spaces/Education/latest/Courses/
54

© 2021 KNIME AG. All rights reserved.
Exercise Session 1
§Import the course material to KNIME Analytics Platform
55
1. Right click on
LOCAL and select
Import KNIME
Workflow….
2. Click on Browse and select
downloaded .knar file
3. Click on Finish

© 2021 KNIME AG. All rights reserved.
ImportingData Exercise
Start with exercise: Importing Data
Read the following files
§Sentiment Analysis.table
§Sentiment Rating.csv
§Product Data2.xls
Optional: Read theweb_activitytablefromthedatabase
WebActivity.sqlite
(hint: draganddropthefilesfromtheKNIME Explorer panelto
getstarted)
You can download the training workflows from the KNIME Hub:
https://hub.knime.com/knime/spaces/Education/latest/Courses/
56

© 2021 KNIME AG. All rights reserved.
RESTful Web Services
§UseKNIME nodestointeractwithRESTfulweb services
§Send requestsusingstandardHTTP methods
JSON Response:
XML Response:
57

© 2021 KNIME AG. All rights reserved.
RESTful Web Services
https://www.knime.com/blog/a-restful-way-to-find-and-retrieve-data
https://www.knime.com/blog/OSM-meets-CSV-file-and-Google-API
Enter URL, or
use from column
Provide authentication
if necessary
Add delay between
individual requests
58

© 2021 KNIME AG. All rights reserved.
JSON Reader and JSON Path nodes
§Use the JSON Reader (or GET Request) node to get a JSON cell
§Use the JSON Path node to query the JSON file and extract parameters
§Editor window simplifies
construction of JSON queries
by auto-generating them
(click on properties)
59

© 2021 KNIME AG. All rights reserved.
Authenticate via pop-up
window (Oauth2)
Google Sheets
§Access yourdatastoredin Google Services
§Read datafromGoogle Sheets
§Write datatonewsheets
§Modify existingsheets
§Makescollaborationandsharingofdataeasy
§(especiallyvs. sendingExcel sheetsvia email...)
60

© 2021 KNIME AG. All rights reserved.
Google Sheets
§Select fromavailablesheetson Google Drive
§Transform datain KNIME, orenrichwithnewdata
§Create newsheetorupdate existingsheets
§Allowstoreadfrom/ writetospecificrangeofsheet(e.g. A1:G10)
Authenticate via pop-
up window (Oauth2)
Select from available sheets,
open in browser for preview
Specify target sheet, select
which columns to write, etc.
61

© 2021 KNIME AG. All rights reserved.
Other Useful Data Sources
§KNIME Analytics Platform provides many more
options to access data:
§AzureData Lake Storage
§SnowflakeConnector
§SMB Connector (e.g.Samba andWindows Server)
§Python/R Source nodes
§TikaParser –extractstextualdatafrom200+ filetypes
§Find out more by downloading the free book
“Will they blend”
https://www.knime.com/knimepress/download-
will-they-blend
62

© 2021 KNIME AG. All rights reserved.
Today’sExample
63

Data Manipulation:
Blend, Aggregate, andClean Data

© 2021 KNIME AG. All rights reserved.
Data Manipulation Nodes
§Yellow colorwitha varietyofinputandoutputports
§Applya transformationtoinputdata
§Many, manynodes!
65

© 2021 KNIME AG. All rights reserved.
Concatenate
Combine rowsfrom2 ormoretableswithsharedcolumns
§Handles duplicaterowkeysgracefully
§Take theunionorintersectionofcolumns
66

© 2021 KNIME AG. All rights reserved.
Dynamic Ports
Add andremovenodeportsbasedon yourneeds, e.g. in ordertoconcatenate
threeormoretables
67
Click on the three dots
in the bottom left corner

© 2021 KNIME AG. All rights reserved.
Cell Replacer
Replacesthecontentofa columnbasedon a lookup
§Top portreferencesthetabletobesearched
§Bottomportholdsthelookuptable(searchkeys
andreplacementvalues)
68

© 2021 KNIME AG. All rights reserved.
CustomerKeyOrderDateOrderIDDoBCityGender
17??1974-02-
23
BerlinF
65??2001-05-
25
StuttgartF
35??1988-08-
05
CologneM
152019-10-07#289851983-07-
20
Hambur
g
M
102091-10-13#299991993-01-
13
BerlinM
Joining Columns of Data
CustomerKeyOrderDateOrderID
222019-09-23#23444
242019-09-30#23457
152019-10-07#28985
102091-10-13#29999
CustomerKeyDoBCityGender
171974-02-23BerlinF
652001-05-25StuttgartF
351988-08-05CologneM
151983-07-20HamburgM
101993-01-13BerlinM
Join by CustomerKey
Inner Join
CustomerKeyOrderDateOrderIDDoBCityGender
152019-10-07#289851983-07-20HamburgM
102091-10-13#299991993-01-13BerlinM
CustomerKeyOrderDateOrderIDDoBCityGender
222019-09-23#23444???
242019-09-30#23457???
152019-10-07#289851983-07-20HamburgM
102091-10-13#299991993-01-13BerlinM
Left Table Right Table
Left Outer Join Right Outer Join
69

© 2021 KNIME AG. All rights reserved.
Joining Columns of Data
Full Outer Join
CustomerKeyOrderDateOrderIDDoBCityGender
17??1974-02-23BerlinF
65??2001-05-25StuttgartF
35??1988-08-05CologneM
152019-10-07#289851983-07-20HamburgM
102091-10-13#299991993-01-13BerlinM
222019-09-23#23444???
242019-09-30#23457???
Missing values in
the left table
Missing values in
the right table
70
Join by CustomerKey
Left Table Right Table
CustomerKeyOrderDateOrderID
222019-09-23#23444
242019-09-30#23457
152019-10-07#28985
102091-10-13#29999
CustomerKeyDoBCityGender
171974-02-23BerlinF
652001-05-25StuttgartF
351988-08-05CologneM
151983-07-20HamburgM
101993-01-13BerlinM

© 2021 KNIME AG. All rights reserved.
Joiner
§Combines columns from two different tables
§Top input port: “Left” data table
§Bottom input port: “Right” data table
§Outputs:
§Top port: Resultingjoinedtable
§Middleport: Unmatchedrowsfromtheleftinputtable(top inputport)
§Bottomport: Unmatched rows from the right input table (bottom input port)
§By defaultthe two bottom output ports are deactivated
71

© 2021 KNIME AG. All rights reserved.
Joiner Configuration –Linking Rows
Values to join on.
Multiple joining columns
are allowed
72
Select the rows which
should be included in the
joined table
Activate this checkbox to
activate the bottom
output ports

© 2021 KNIME AG. All rights reserved.
Joiner Configuration –Column Selection
Columns fromtop
tableforjoinedtable
Columns fromlower
tableforjoinedtable
73

© 2021 KNIME AG. All rights reserved.
Data Manipulation Exercise, ActivityI
Start with exercise: Data Manipulation, Activity I
§Concatenateweb activity data from the old and new systems
§Replacethe written sentiment values with the numeric sentiment scores
§Joinall data into one table using a series of joiner nodes (use "Customer Key"
as the joining column)
74

Data Aggregation

© 2021 KNIME AG. All rights reserved.
Aggregated on Category (group)by Sum (aggregation method)
Product IDCategory# Ordered Items
P 1Clothing2
P 2Home3
P 3Clothing1
P 4Clothing5
P 5Electronics7
P 6Electronics5
Group Sum(# Ordered Items)
Clothing8
Home3
Electronics12
Data Aggregation -GroupBy
76

© 2021 KNIME AG. All rights reserved.
GroupBy
Aggregate rowstosummarizedata
§First tabprovidesgroupingoptions
§Second tabprovidescontroloveraggregationdetails
YouTube KNIME TV video: https://youtu.be/bDwF-TOMtWw
Aggregation columns
Aggregation methods
77

© 2021 KNIME AG. All rights reserved.
Data Aggregation -Pivoting
CategoryOnlineOnsite
Clothing21
Home01
Electronics20
Aggregation: Count
Aggregation: Sum (# Ordered Items)
Product IDStoreCategory# Ordered Items
P 1OnlineClothing2
P 2OnsiteHome3
P 3OnsiteClothing1
P 4OnlineClothing5
P 5OnlineElectronics7
P 6OnlineElectronics5
CategoryOnlineOnsite
Clothing71
Home03
Electronics120
Solution: Pivoting Node
78

© 2021 KNIME AG. All rights reserved.
Data Aggregation -Pivoting
Product IDStoreCategory# Ordered Items
P 1OnlineClothing2
P 2OnsiteHome3
P 3OnsiteClothing1
P 4OnlineClothing5
P 5OnlineElectronics7
P 6OnlineElectronics5
CategoryOnlineOnsite
Clothing71
Home03
Electronics120
Aggregation: Sum (# Ordered Items)
Pivoting Node: Group-Pivot-Aggregate
79

© 2021 KNIME AG. All rights reserved.
Pivoting
Performs pivoting on selected columns for grouping and pivoting
§Values of group columns become unique rows
§Values of the pivot columns become unique columns for each set of column
combination together with each aggregation
§Many aggregation methods are provided (similar to GroupBy)
80

© 2021 KNIME AG. All rights reserved.
Pivoting
Groups ~ Rows
Pivots ~ Columns
Aggregation
81

Data Cleaning

© 2021 KNIME AG. All rights reserved.
Table Manipulator
Allows for
§Concatenation of multiple files/tables
§Column filtering
§Column sorting
§Column renaming
§Column type mapping
83

© 2021 KNIME AG. All rights reserved.
Row Filter and Row Splitter
§Row filtering with include and exclude options according to certain criteria
§Certain value or pattern in a selectable column
§Row number
§Row ID
84

© 2021 KNIME AG. All rights reserved.
Duplicate Row Filter
Detectduplicaterowsandapplya selectedtreatment
§First tabprovidestheoptiontoselectcolumns
§Second tabprovidesoptionsfortreatingduplicatedvalues
Flag or Remove
Duplicates
Select criteria to
keep row
85

© 2021 KNIME AG. All rights reserved.
Column Expression
§Append or modify an arbitrary number of
columns using expressions
§Many different functions are available
§No restriction on number of lines per
expression allow to write complex
expressions
§Part of the KNIME Labs extension
86

© 2021 KNIME AG. All rights reserved.
String Manipulation
Create and edit values in a String Column
§Cleans up capitalization
§Joins string values
§Pads strings, e.g. padLeft
§Replaces string values
87

Workflow Organization and
Documentation

© 2021 KNIME AG. All rights reserved.
Comments & Annotations
YouTube KNIME TV Channel:
https://youtu.be/AHURYB_O8sA
Double-click to write
Use the panel to
change properties
Double-click to write
Use the panel to
change properties
89

© 2021 KNIME AG. All rights reserved.
Workflow Organisation –GoodPractices
§Workflow annotations
§Node labels
§Metanodes
§Right click -> Create Metanode...
§Organize workflow by task
§Hide complexity & improve readability
90

© 2021 KNIME AG. All rights reserved.
Workflow Organisation –Components
§Component encapsulates a
reusable functionality as a
KNIME workflow
§Components can be configured
as any KNIME nodes
§Access and share components
on the KNIME HubDrag and drop from the
KNIME Hub to your workflow
91

© 2021 KNIME AG. All rights reserved.
KNIME WorkflowDiff
§Automates identification and comparison of nodes in a workflow, metanodes,
and two different workflows
§Identifies insertions, deletions, substitutions, and parameter changes
92

© 2021 KNIME AG. All rights reserved.
Data Manipulation Exercise, ActivityII
Start with exercise Data Manipulation, Activity II
§Filter out duplicate rows
§Make sure that all product names are written in lower case letters
§Calculate the average age of the customers per product
§Remove the column Sentiment Rating using the Table Manipulator node
§Clean up and document your workflow using annotations, node labels, and
metanodes
93

© 2021 KNIME AG. All rights reserved.
Today’sExample
94

Data Visualization
Charts andTables

© 2021 KNIME AG. All rights reserved.
Data Visualization
§Large selection of easy to use
visualization nodes
§Web-based and interactive
§Dedicatednodes,
§no scripting required
§Plotlynodes
§Similar but integrated from an external library
§R and Python View nodes for highly
customizable graphics
§Require scripting
96

© 2021 KNIME AG. All rights reserved.
Visualizations Using One Column
97

© 2021 KNIME AG. All rights reserved.
Visualizations Using Two Columns
98

© 2021 KNIME AG. All rights reserved.
Visualizations Using Three Columns
99

© 2021 KNIME AG. All rights reserved.
Scatter Plot
§Plots different columnson X andY
§Displays dataincludingcolorinformation
§Producesan interactiveviewandan image
§Select datapointsandpublishselectionto
otherviews
Image
outport
Interactivity
options
100

© 2021 KNIME AG. All rights reserved.
Scatter Plot
Four configuration tabs
101

© 2021 KNIME AG. All rights reserved.
Color Manager
§Color bynominal orcontinuousvalues
§SynccolorsbetweenviewsusingthecolormodelportandColor Appendernode
Color range
for numerical
values
Discrete
colors for
nominal
values
102

© 2021 KNIME AG. All rights reserved.
Bar Chart
§Show numericalvaluesacrosscategories
§Verticalorhorizontal bars
§Bars canbegroupedorstacked
103

© 2021 KNIME AG. All rights reserved.
Line Plot
§Plot sequenceofvalues, e.g. overtime
§Usefultoidentifytrends, also betweengroups
104

© 2021 KNIME AG. All rights reserved.
Stacked Area Chart
§Visualizesnumericalvaluesfrommultiple columnsasstackedareas
§Great forplottingdistributionsovertime
105

© 2021 KNIME AG. All rights reserved.
Selection& Filteringin JavaScript Views
Interactivityallowsyoutoselectdatapointsin views
§Selectionispropagatedtootherviews
§Highlight selectedrowsorfilterthem
§Click “Apply” to add column to data that indicates selection (true/false) for use in downstream nodes
Applyselection
106

© 2021 KNIME AG. All rights reserved.
Components –CombinedViews
§Multiple JavaScript View nodes
canbecombinedin Components
§Selectionsaretransmittedtoall
otherviews
§Also foruseon theKNIME
WebPortal
Scatter Plot
Table View
107

© 2021 KNIME AG. All rights reserved.
Interactivity across Charts: Selection and Filter Events
108

© 2021 KNIME AG. All rights reserved.
Interactivity across Charts: Selection and Filter Events
109

© 2021 KNIME AG. All rights reserved.
Interactivity across Charts: Selection and Filter Events
Subscribing to Selection and Filter
110

© 2021 KNIME AG. All rights reserved.
Interactivity across Charts: Selection and Filter Events
Publishing Selection
111

© 2021 KNIME AG. All rights reserved.
Configure Content and Views Layout
§Click layout button when inside
Component to assign views to rows
and columns
§Add views and rows via drag&drop
§Add columns using +buttons
112

© 2021 KNIME AG. All rights reserved.
Script-basedView Nodes
§R View nodesforgreatercustomizability
§Useyourfavoritelibraries, e.g. ggplot2
§IfyoupreferPython: Python View node
§ForJS developers: GenericJavaScript View
113

© 2021 KNIME AG. All rights reserved.
Legacy View Nodes: JFreeChart& KNIME Views
§KNIME providesthreetypesof
visualizations
§JavaScript Views
§JFreeChartViews
§LocalViews
§ActivedevelopmentonlyforJavaScript
Views -> usethose!
§JFreeChartandLocalViews still useful
whenvisualizinglocally
JFreeChart Views
Local Views
114

© 2021 KNIME AG. All rights reserved.
VisualizationExercise
Start with exercise: Visualization
§Read sales.csvdata
§Assign a different color to each product
§Plot BasketValueagainst BasketSizeusing the Scatter Plot node
§Show the total BasketValueby time and product in a Line Plot and a Stacked Area Chart (Use the
Pivoting node to get the sum of sales by Quarter and Product!)
§Execute the Fully Joined Data metanode
§Show the number of customers in the different web activity categories in a Bar Chart
§Show the age distribution of the customers in a Histogram
§Create a composite view by combining the Bar Chart and Histogram
§Select one web activity class in the Bar Chart. Which age classes are represented in the selected
web activity class?
115

© 2021 KNIME AG. All rights reserved.
Today’sExample
116

Data Mining
Partition, Learn, Predict, Score

© 2021 KNIME AG. All rights reserved.
Data Mining Strategies
Example Applications:
§Anomaly Detection (fraud, predictive maintenance)
§Association Rule Learning (market basket analysis)
§Clustering (customer / market segmentation)
§Classification (next best offer, churn preventions)
§Regression (trend estimation)
118

© 2021 KNIME AG. All rights reserved.
Data Mining: Process Overview
119
Original
Data Set
Training
Set
Test Set
Train
Model
Apply
Model
Score
Model
Partition dataTrain andapply
model
Evaluate
performance

© 2021 KNIME AG. All rights reserved.
Data Mining in KNIME
§KNIME has many modeling tools!
§Decision tree, random forest, SVM,
regression,
neural networks, clustering, …
§and integrations with other libraries:
R, Python, H2O, WEKA, libSVM, etc.
§And many model evaluation nodes
§ROC, standard, numeric and entropy
scorers
§Feature selection
§Cross validation
120

© 2021 KNIME AG. All rights reserved.
Partitioning
§Usetosplitdataintotrainingandevaluationsets
§Partition bycount(e.g. 10 rows) orfraction(e.g. 10%)
§Sample bya varietyofmethods; random, linear, stratified
121

© 2021 KNIME AG. All rights reserved.
Learner-PredictorMotif
§Most dataminingapproachesin KNIME usea Learner-predictormotif.
§The Learnernodetrainsthemodelwithitsinputdata.
§The Predictornodeappliesthemodeltoa different subsetofdata.
Trained Model
New Data!
122

© 2021 KNIME AG. All rights reserved.
Classification
Predict nominal outcomes on existing data (supervised)
§Applications
§Churn analysis (yes/no)
§Chemical activity (active/inactive)
§Spam detection (spam/not spam)
§Optical character recognition (A-Z)
§Methods
§Decision Trees
§Neural Networks
§Naïve Bayes
§Logistic Regression
123

© 2021 KNIME AG. All rights reserved.
TargetColumn
§Target column contains values that are predicted by the classification model
§Binomial target values are often encoded to 1 and 0
ApplicationTarget ColumnTarget Values
Churn analysisChurnYes/No or 1/0
Chemical
activity
ActiveYes/No or 1/0
Spam
Detection
SpamYes/No or 1/0
Optical
Character
Recognition
CharacterA-Z
124

© 2021 KNIME AG. All rights reserved.
KNIME’sDecisionTree
§C4.5 builds a tree from a set of training data using the concept of information
entropy.
§At each node of the tree, the attribute of the data with the highest normalized
information gain(difference in entropy) is chosen to split the data.
§The C4.5 algorithm then recurses on the smaller sub lists.
J.R. Quinlan, “C4.5 Programs for machine learning”
J. Shafer, R. Agrawal, M. Mehta, “SPRINT: A Scalable Parallel Classifier for
Data Mining”
125

© 2021 KNIME AG. All rights reserved.
Decision Tree Learner
126

© 2021 KNIME AG. All rights reserved.
DecisionTreeView
Most of the people who don‘t churn
have more than one contract
127

© 2021 KNIME AG. All rights reserved.
Decision Tree Predictor
§Takes a decisiontreemodel& appliesittonewdata
§Check thebox toappendclassprobabilities
128

© 2021 KNIME AG. All rights reserved.
Scorer
Comparepredictedresultstoknowntruth
in ordertoevaluatemodelquality
129

© 2021 KNIME AG. All rights reserved.
Scorer
§Confusionmatrixshowsthedistribution
ofmodelerrors
§An accuracystatisticstableprovidesa
detailedanalysisofmodelquality
130

© 2021 KNIME AG. All rights reserved.
ConfusionMatrix
131

© 2021 KNIME AG. All rights reserved.
Receiver Operating Characteristics
§Plot true positive rate vs false positive rate for different thresholds
§Ideal models achieve 100% TPR with 0% FPR
§Area under the curve indicates model quality
§(1=ideal model, 0.5 = random outcome)
132

© 2021 KNIME AG. All rights reserved.
ROC Curve
§Requires individual class probabilities from a
preceding predictor
§User must define:
§Original class column
§Positive class value
§Probability for the selected positive
class value for one or multiple models
133

© 2021 KNIME AG. All rights reserved.
Data Mining Exercise, ActivityI
Start with exercise: Data Mining, Activity I:
§Partition the fully joined data into a training and test set (50%, Stratified
Sampling on Target)
§Train a decision tree on the training set to predict Target
§Use the trained model to predict Target in the test set
§What is the overall accuracy of your model?
§Optional: Evaluate the accuracy and robustness of the model with the ROC
Curve node
134

© 2021 KNIME AG. All rights reserved.
Regression
Predict numericoutcomes on existing data (supervised)
§Applications
§Forecasting
§Quantitative Analysis
§Methods
§Linear
§Polynomial
§Regression Trees
§Partial Least Squares
135

© 2021 KNIME AG. All rights reserved.
Linear Regression
§The target variable (dependent variable) !"is modeled as linear combination of
the input features (independent variables)
§Two input features: !"=$!+$"&"+$#&#
§p input features: !"=$!+$"&"+$#&#+⋯+$$&$
§The coefficients #!,..,#"are calculated by minimizing the squared error
between the predicted !"and the true value ".
136
!
! !"=#!+##(
Residual
ei
yi
136

© 2021 KNIME AG. All rights reserved.
Linear Regression Learner & Regression Predictor
A linear modelrelatinga dependentvariable tooneormoreindependentvariables
§Model coefficientsprovidedin 2nd outputport
§Also available: PolynomialandTreeEnsemble Regression nodes
137

© 2021 KNIME AG. All rights reserved.
Numeric Scorer
Similartoscorernode, but fornodeswithnumeric
predictions(e.g. linear/polynomialregression)
§Comparedependentvariable valuestopredicted
valuestoevaluategoodnessoffit.
§Report R2, MAE, MSE, RMSE etc.
138

© 2021 KNIME AG. All rights reserved.
Data Mining Exercise, ActivityII
§Start with exercise: Data Mining, Activity II:
§Read weather.tabledata
§Split the data into rows up to 2016 (training set) and rows from 2017 on (test
set)
§Train a linear regression model that predicts the AIR_TEMP as a function of all
other features in the dataset
§Use the model to predict the temperature in 2017 and evaluate the model with
the Numeric Scorer node
§Optional:
§Calculate the mean temperature per month in the training data
§Join the mean temperature per month to the test set
§Use the Numeric Scorer to see if the average monthly temperature provides a better prediction than
the Linear Regression model
139

© 2021 KNIME AG. All rights reserved.
Clustering
Discover hidden structure in unlabeled data (unsupervised)
§Applications
§Market Segmentation
§Diversity picking
§Methods
§K-means/medoids
§Hierarchical
§DBScan
§OPTICS
§Neighbourgrams
140

© 2021 KNIME AG. All rights reserved.
k-Means-Algorithm
Given k, the k-Means algorithm is implemented in four steps:
1.Partition objects into )non-empty subsets, calculate their centroids(i.e.,
mean point, of the cluster)
2.Assign each object to the cluster with the nearest centroid using the Euclidean
distance
3.Compute the centroids from the current partition
4.Go back to Step 2, repeat until the updated centroids stop moving significantly
141

© 2021 KNIME AG. All rights reserved.
k-Means Algorithm
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 910
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 910
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 910
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 910
Calculation of
new centroids
Cluster assignment
Calculation of
new centroids
142

© 2021 KNIME AG. All rights reserved.
k-Means Clustering
§Looks at nobservationstodefinethe
meansforkclusters.
§Eachobservationisthenassignedto
itsclosestclustercenter.
§Youmust providek.
143

© 2021 KNIME AG. All rights reserved.
Data Mining Exercise, ActivityIII
Start with exercise: Data Mining, Activity III
§Read location_data.tabledata
§Filter the data to entries from California (region_code= CA)
§Perform k-means clustering with k=3. Use only latitude and longitude for
clustering.
§Optional: plot latitude and longitude in a view (OSM Map or Scatter Plot) and
use the view to visually optimize k
144

© 2021 KNIME AG. All rights reserved.
Scripting Integrations: R and Python
§Run R or Python code in KNIME
Analytics Platform
§Works with existing Python and R
installations
§Syntax highlighting support
§Different nodes for many tasks,
e.gtraining a model using an
algorithm available in Python
145

© 2021 KNIME AG. All rights reserved.
Java Snippet
§Fastest runningscriptingnodein KNIME
§Syntax highlighting, autocompletion,
errorchecking
§Templates allowyoutosave scripts
forlaterre-use
§Import customlibraries
146

© 2021 KNIME AG. All rights reserved.
Today’sExample
147

Exporting Data & Deployment

© 2021 KNIME AG. All rights reserved.
ExportingData
After an analysis is completed, what next?
§Write results to a file
§Create/update a database
§Generate a rich report using BIRT
§Send your data to Tableau, Spotfire, PowerBIto create a report
§Deploy via KNIME WebPortal
§Deploy your model as RESTful web service
§Upload results to a Cloud Storage
149

© 2021 KNIME AG. All rights reserved.
Data Export Nodes
Typically characterized by:
§Magenta color
§1 input port, no outputports
§Create file on file system or write to database
150

© 2021 KNIME AG. All rights reserved.
Table Writer
151

© 2021 KNIME AG. All rights reserved.
Excel Writer
§Writes the input table into a
spreadsheet of an Excel file
§Select append, to append a
spreadsheet to an existing Excel File
and define the name of the new sheet
152
Activate toappend
an Excel Sheet

© 2021 KNIME AG. All rights reserved.
Write Files to a Remote File System
§The new file handling framework makes it easy to upload data to remote file
systems
§Write processed data directly with a writer node
§Upload local files with the Transfer Files node
§Supported file systems
§Microsoft Azure
§Google
§Amazon
§Databricks
§BigDatafile systems (hdfs, httpFS, …)
§On-premise (e.g.ssh, ftp, …)
153

© 2021 KNIME AG. All rights reserved.
Full Flexibility with the TranfserFiles node
154
Same cloudenvironment
Cross cloud environmentsOn-premise

© 2021 KNIME AG. All rights reserved.
Other Utility Nodes
Can be used local and with remote file systems
§Create a folder
§Delete files or folders
§List all files is a folder
§Further information about file handling
https://docs.knime.com/latest/analytics_platform_file_handling_guide/index.html
155

© 2021 KNIME AG. All rights reserved.
DB Writer
§Writesdatafroma KNIME datatabledirectlyinto
a databasetable
Appendtoordrop
existingtable
Increasebatchsizefor
betterperformance
156

© 2021 KNIME AG. All rights reserved.
Creating a Dashboard on KNIME WebPortal
Step1
Upload File
Step 4
Interactive View
Step2
Select Columns
Step3
CustomizeColumn
Domains
Step 5
Download Image
157

© 2021 KNIME AG. All rights reserved.
Workflow on KNIME WebPortal
Available in
KNIME Server
WebPortal Page
(Step 1)
Upload File
WebPortal Page
(Step 4)
Interactive View
158

© 2021 KNIME AG. All rights reserved.
Components to Produce Dashboard on Web Page
Stacked
Area Chart
File
SelectionColumn
Selection
Row FilterFilter by
Range
159

© 2021 KNIME AG. All rights reserved.
Automation: Call Local Workflow
§UseCall LocalWorkflow nodetosend
dataandparameterstootherworkflows
andtriggerexecution
§Send resultsback tocaller-workflow
§Includereportfromcalledworkflow
§Create modular workflows
§E.g. separate workflowsforETL andprediction
§Alternative: Call Remote Workflow
§Trigger executionofworkflowson KNIME Server
via REST API
Path to workflow
Click to query the
expected input(s)
Specify source column(s)
with input data / parameters
Add report to output
160

© 2021 KNIME AG. All rights reserved.
Automation: Call Local Workflow
Enter default format
for incoming data
Send results back to
caller-workflow
Calls workflow once
for each input row
Convert output data
to KNIME table
ETL
Prediction
161

© 2021 KNIME AG. All rights reserved.
Use Call Local Workflow to Send Conditional Emails with Report
Sometimes, reportshouldbesentunderspecificcircumstances
§E.g.ifsomeKPI isbelowthreshold
Workflow creates
report, sends back
binary column
Convert binary
column to file and
save to temp dir
Path to file
with report
Provide email
credentials, host, etc.
Define rule, only
send email if
conditions apply
162

© 2021 KNIME AG. All rights reserved.
KNIME Server as a REST Resource
https://www.knime.org/blog/giving-the-knime-server-a-rest
163

© 2021 KNIME AG. All rights reserved.
KNIME Server as a REST resource
§Use Swagger to explore the HTTP
requests and test them
164

Reporting in KNIME

© 2021 KNIME AG. All rights reserved.
Reporting in KNIME
§Reporting in KNIME is done via a 3rd party
application named BIRT (Business Intelligence
Reporting Tool)
§Data is sent to BIRT from KNIME using special
nodes.
§Reports in BIRT are constructed from report
items, which may include images, tables,
charts and labels.
§Reports may be generated in a variety of
formats (html, pdf, pptx, xlsx, docx, …)
166

© 2021 KNIME AG. All rights reserved.
Installing Extension
§Install the KNIME Report Designer Extension to use BIRT
§Install extension by going to File -> Install KNIME Extension or via Drag & Drop
from the KNIME Hub
167

© 2021 KNIME AG. All rights reserved.
Data to Report
Send a data table toBIRT
Hint: The nodelabelwill beusedtoidentifythe
datasourcein thereportingview-> Makesureto
useunderstandablelabelsifyouhavemorethan
onedatasource
Set the node
label!
168

© 2021 KNIME AG. All rights reserved.
Image to Report
Send an imagetoBIRT
§PNG andSVG aresupportedformats
(seenodedescriptionfordetails)
Hint: Customizetheimagesizein theData
toReport nodetofit thereport
169

© 2021 KNIME AG. All rights reserved.
Edit the Report
Open the workflow and click the Report Editor button in the tool bar
170

© 2021 KNIME AG. All rights reserved.
Reporting Perspective
Data from KNIME -
names of data
sources are taken
from node label
Report layout –only
structure, data is filled in
when creating the report
View tabs
Add report items via
drag and drop
Click button to
create report
171

© 2021 KNIME AG. All rights reserved.
Charting in BIRT
§Manycharttypes
§Fine controlofplotappearance
§Familiar‘Excel Like’ interface
§Supports interactivity
172

© 2021 KNIME AG. All rights reserved.
Tips & Tricks
§Use an underlying grid to structure the report
§Names of columns should not change
§Use the grouping function to combine results
§Use the Master Layout Tab (For footers etc.)
173

© 2021 KNIME AG. All rights reserved.
Exporting Data Exercise
Start with exercise: Exporting Data
§Write the predictions to a KNIME table
§Write the decision tree model to a PMML model file
§Create a heatmap of the normalized confusion matrix of your model and send it
to a BIRT report
§Send your model accuracy to a BIRT report
§Create a simple report showing the overall accuracy and the heatmap of the
confusion matrix
§Generate a PDF of your report
175

© 2021 KNIME AG. All rights reserved.
Today’sExample: ChurnPrediction
176

Thank You!
[email protected]
177
Tags