ML using minimal to no coding-KNIME.pptx

SrishtiSharma740264 13 views 20 slides Oct 12, 2024

Slide 1 of 20

About This Presentation

KNIME

Size: 1.33 MB

Language: en

Added: Oct 12, 2024

Slides: 20 pages

Slide Content

Step 1 Read Data from File File Reader Node Table Reader Node Excel Reader Node Absolute and Relative Paths: the knime :// Protocol Accessing REST Services

Step 2 ETL and Data Manipulation 2.1 Row and Column Filtering 2.2 Aggregations 2.3 Join and Concatenation 2.4 Transformation: Conversion, Replacement, Standardization, and New Feature Generation 2.5 Data Preparation for Time Series Analysis

2.1 Row and Column Filtering Basic Row Filter Advanced Row Filter Column Filter

2.3 Aggregations Classic Aggregations with GroupBy node: A classic aggregation operation consists of two steps: identifying data groups and calculating the aggregation method on the selected groups . Basic groupby aggregation Advanced groupby aggregation Pivoting

Read adult.csv data set. Then: calculate total number of rows and average age for all Female with income >50K per year on each one of the 4 groups defined by sex and income values, calculate the average of all numerical columns on full input table count: rows with missing values in column “ occupation” all rows in column “ occupation” rows with no missing value in column “ occupation” all rows in another column (i.e. marital-status). Notice that this number should be the same as the number for all rows in column “occupation”.

Pivoting The pivoting function requires one or more grouping columns to define the rows, and one or more pivoting columns to define the columns of the pivot table. The rows and columns define unique sub-groups of the data. These sub-groups can then be summarized by aggregated measures. The possible aggregations range from listing and counting values, to calculations on date & time, and to statistical measures.

Question 1 . Using the “ age” column as the grouping column and “ workclass ” column as the pivoting column, calculate the number of people in groups according to their work class and age. 1a. What is the most common combination of age bin and work class? 1b. How many people belong to this group?

2.3 Join and Concatenation Join: inner join, right outer join, left outer join, full outer join Concatenation

Read adult.csv data set. Then calculate the average age and number of rows for the 4 groups defined by (sex, income) and join the corresponding 2 aggregated values to each row in the group.

Differentiate joining from concatenation Read adult.csv data set. Then extract people with age between 20 and 40 and working in a work group starting with "S" and people with age between 40 and 60 and working in the Private sector ( workclass starts with "P"). Put both groups in a single data table.

2.4 Transformation: Conversion, Replacement, Standardization, and New Feature Generation Data are standardized before being stored, analyzed, or reported. This means, string and date & time values are converted to follow the same style and format, numbers are normalized, and new features are created from the existing ones. Possible string manipulation operations are extracting substrings, standardizing texts to lower case or upper case, or adding a prefix/suffix to string values, for example. To numbers you could apply some kind of mathematical transformation, like for example normalization or logarithmic transformation. In general, data can be transformed to generate new, hopefully, more informative input features.

Data Manipulation: Numbers, Strings, and Rules String Manipulation node Math Formula node and Rule Engine node

Read the sales.csv dataset. Using the Rule Engine node, create a new column “currency” with value “USD” for the orders from the USA, and “EUR” for the orders from Germany. Using the Rule Engine node, create a new column “conversion” with value 1 if currency is “EUR”, and 0.88 if currency is “USD” (we refer to the exchange rate of Nov-04-2018). Using the Math Formula node, calculate values in a new column named “amount-in-EUR” by multiplying the value in column “amount” by the value in column “conversion”.

Column Expressions for Data Manipulation The Column Expressions node is useful because it can perform multiple data manipulation tasks at once. It can replace combinations of other data manipulation nodes, such as the String Manipulation, Math Formula, and Rule Engine nodes, with this single node.

Exercise Read the sales.csv dataset. Write an expression that extracts the first three letters of country names and converts them to upper case letters. Append a new column and name it “ Country_Code ”. Write an expression that multiplies the sales amount by the conversion rate. Replace the “amount” column, but change its type to double. Write an expression that assigns the value “N” to the missing values in the “card” column. Replace the “card” column.

ML using minimal to no coding-KNIME.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

ML using minimal to no coding-KNIME.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......