ML using minimal to no coding-KNIME.pptx

SrishtiSharma740264 13 views 20 slides Oct 12, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

KNIME


Slide Content

Step 1 Read Data from File  File Reader Node  Table Reader Node  Excel Reader Node  Absolute and Relative Paths: the knime :// Protocol Accessing REST Services

Step 2 ETL and Data Manipulation 2.1 Row and Column Filtering 2.2 Aggregations 2.3 Join and Concatenation 2.4 Transformation: Conversion, Replacement, Standardization, and New Feature Generation 2.5 Data Preparation for Time Series Analysis

2.1 Row and Column Filtering Basic Row Filter  Advanced Row Filter  Column Filter

2.3 Aggregations Classic Aggregations with GroupBy node: A classic aggregation operation consists of two steps: identifying data groups and calculating the aggregation method on the selected groups . Basic groupby aggregation  Advanced groupby aggregation  Pivoting

Read  adult.csv  data set. Then: calculate total number of rows and average age for all Female with income >50K per year on each one of the 4 groups defined by sex and income values, calculate the average of all numerical columns on full input table count: rows with missing values in column “ occupation” all rows in column “ occupation” rows with no missing value in column “ occupation” all rows in another column (i.e. marital-status). Notice that this number should be the same as the number for all rows in column “occupation”.

Pivoting The pivoting function requires one or more grouping columns to define the rows, and one or more pivoting columns to define the columns of the pivot table. The rows and columns define unique sub-groups of the data. These sub-groups can then be summarized by aggregated measures. The possible aggregations range from listing and counting values, to calculations on date & time, and to statistical measures.

Question 1 . Using the “ age” column as the grouping column and “ workclass ” column as the pivoting column, calculate the number of people in groups according to their work class and age.      1a. What is the most common combination of age bin and work class?      1b. How many people belong to this group?

2.3 Join and Concatenation Join: inner join, right outer join, left outer join, full outer join Concatenation

Read  adult.csv  data set. Then calculate the average age and number of rows for the 4 groups defined by (sex, income) and join the corresponding 2 aggregated values to each row in the group.

Differentiate joining from concatenation Read  adult.csv  data set. Then extract people with age between 20 and 40 and working in a work group starting with "S" and people with age between 40 and 60 and working in the Private sector ( workclass starts with "P"). Put both groups in a single data table.

2.4 Transformation: Conversion, Replacement, Standardization, and New Feature Generation Data are standardized before being stored, analyzed, or reported. This means, string and date & time values are converted to follow the same style and format, numbers are normalized, and new features are created from the existing ones. Possible string manipulation operations are extracting substrings, standardizing texts to lower case or upper case, or adding a prefix/suffix to string values, for example. To numbers you could apply some kind of mathematical transformation, like for example normalization or logarithmic transformation. In general, data can be transformed to generate new, hopefully, more informative input features.

Data Manipulation: Numbers, Strings, and Rules String Manipulation node Math Formula node and Rule Engine node

Read the sales.csv dataset. Using the Rule Engine node, create a new column “currency” with value “USD” for the orders from the USA, and “EUR” for the orders from Germany. Using the Rule Engine node, create a new column “conversion” with value 1 if currency is “EUR”, and 0.88 if currency is “USD” (we refer to the exchange rate of Nov-04-2018). Using the Math Formula node, calculate values in a new column named “amount-in-EUR” by multiplying the value in column “amount” by the value in column “conversion”.

Column Expressions for Data Manipulation The Column Expressions node is useful because it can perform multiple data manipulation tasks at once. It can replace combinations of other data manipulation nodes, such as the String Manipulation, Math Formula, and Rule Engine nodes, with this single node.

Exercise Read the  sales.csv dataset. Write an expression that extracts the first three letters of country names and converts them to upper case letters. Append a new column and name it “ Country_Code ”. Write an expression that multiplies the sales amount by the conversion rate. Replace the “amount” column, but change its type to double. Write an expression that assigns the value “N” to the missing values in the “card” column. Replace the “card” column.