DSCI 552 machine learning for data science

pavithrak2205 83 views 56 slides Mar 12, 2024
Slide 1
Slide 1 of 56
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56

About This Presentation

machine learning


Slide Content

DSCI 552 MACHINE LEARNING FOR
DATA SCIENCE
Ke-Thia Yao
Lecture 1, 12 January 2023
1

Textbook
2
Ethem Alpaydin
Introduction to Machine
Learning, Fourth Edition
MIT Press
ISBN 9780262043793

Optional Textbook for Scikit-Learn
Aurélien Géron
Hands-On Machine Learning with
Scikit-Learn, Keras, and
TensorFlow, 3rd Edition
Available online through USC
library
https://libraries.usc.edu/databases/safari-
books
Scikit-Learn website provides
excellent documentation and user
guides
https://scikit-
learn.org/stable/index.html
3

Office Hours
4
USC ISI Office:
4676 Admiralty Way, Suite 835
Marina del Rey, CA 90292
(310) 448-8297
[email protected]
USC Marina del Rey Shuttle
http://transnet.usc.edu/index.php/bus-map-schedules/
Office Hours:
Tuesdays 2-4PM on Zoom
https://usc.zoom.us/j/95896335860?pwd=MkhtMEsvR1BsUThvU3hMYjNHZE5Gd
z09&from=addon
Thursdays 2-4PM on campus, location TDB

Grading
5
Homework / Programming Assignments: 35%
Class participation: 5%
Midterm: 20%
Final Exam: 20%
Semester Project: 20%

Viterbi Code of Academic Integrity
"A Community of Honor"
6
We are the USC Viterbi School of Engineering, a community of
academic and professional integrity. As students, faculty, and staff our
fundamental purpose is the pursuit of knowledge and truth. We
recognize that ethics and honesty are essential to this mission and
pledge to uphold the highest standards of these principles. As
responsible men and women of engineering, our lifelong commitment
is to respect others and be fair in all endeavors. Our actions will reflect
and promote a community of honor.

Schedule
7
Date Topic
12-Jan-23Introduction to ML, Supervised learning, Bias, K-nearest neighbor vs
19-Jan-23Bayesian decision theory, Naïve Bayes, Jupyter, SciKit Learn
26-Jan-23Parametric Methods, Bias/Variance Trade-off
2-Feb-23Nonparametric methods, Decision Trees
9-Feb-23Dimension reduction
16-Feb-23Clustering
23-Feb-23Linear Discrimination, Multilayer Perceptrons,
2-Mar-23Midterm
9-Mar-23Deep Learning
16-Mar-23Spring Recess
23-Mar-23Local Models, Kernel Machines
30-Mar-23Graph Models, Boltzmann Machines, Quantum Adiabetic Annealer
6-Apr-23Hidden Markov Models
13-Apr-23Combining Multiple Learners
20-Apr-23Reinforcement Learning
27-Apr-23Presentation

What is Machine Learning
8
Machine learning is the science (and art) of programming computers so they
can learn from data
Here is a slightly more general definition:
[Machine learning is the] field of study that gives computers the ability to learn
without being explicitly programmed.
Arthur Samuel, 1959
And a more engineering-oriented one:
A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P,
improves with experience E.
Tom Mitchell, 1997

Why Use Machine Learning
9
Traditional approach Machine learning approach

Why “Learn” ?
10
Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
There is no need to “learn” to calculate payroll
Learning is used when:
Human expertise does not exist (navigating on Mars),
Humans are unable to explain their expertise (speech recognition)
Solution changes in time (routing on a computer network)
Solution needs to be adapted to particular cases (user biometrics)

Big Data
11
Widespread use of personal computers and wireless communication
leads to “big data”
Weare both producers and consumers of data
Data is not random, it has structure, e.g., customer behavior
We need “big theory” to extract that structure from data for
(a) Understanding the process
(b) Making predictions for the future
Cheapercomputationalpower(e.g., GPUs).

Why Mine Data? Scientific Viewpoint
12
Data collected and stored at
enormous speeds (GB/hour)
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene expression data
scientific simulations generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists
in classifying and segmenting data
in Hypothesis Formation

Why Mine Data? Commercial Viewpoint
13
Lots of data is being collected
and warehoused
Web data, e-commerce
purchases at department/
grocery stores
Bank/Credit Card
transactions
Computers have become cheaper and more powerful
Competitive Pressure is Strong
Provide better, customized services for an edge (e.g. in Customer
Relationship Management)

Big Data Opportunity
14
Unlock significant value by making information transparent and usable
Collect and store more accurate and detailed data in digital form
Allows ever-narrower segmentation of customers and provide precise
tailored products & services
Sophisticated analytics to substantially improve decision making
Improve the next generation of products and services
Source: McKinsey & Company

Big Data Opportunity (cont.)
15
McKinsey Report
Data have swept into every industry and business function and are
now an important factor of production, alongside labor and capital.
The use of big data will become a key basis of competition and
growth for individual firms.
The use of big data will underpin new waves of productivity growth
and consumer surplus.
There will be a shortage of talent necessary for organizations to take
advantage of big data.

Data Mining
16
Retail:Market basket analysis, Customer relationship management
(CRM)
Finance:Credit scoring, fraud detection
Manufacturing: Control, robotics, troubleshooting
Medicine: Medical diagnosis
Telecommunications:Spam filters, intrusion detection
Bioinformatics: Motifs, alignment
Web mining: Search engines
...

What We Talk About When We Talk About
“Learning”
17
Learning general models from a data of particular examples
Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
Example in retail: Customer transactions to consumer behavior:
People who bought “Blink” also bought “Outliers” (www.amazon.com)
Build a model that is a good and useful approximationto the data.

What is Machine Learning?
18
Optimize a performance criterion using example data or past
experience
Role of Statistics: Inference from a sample
Role of Computer science: Efficient algorithms to
Solve the optimization problem
Representing and evaluating the model for inference
Role of domain knowledge
Selecting the right attributes, representation and datasets

Machine Learning Tasks
19
Supervised Learning
Classification
Regression
Unsupervised Learning
Association
Reinforcement Learning

Supervised Learning: Classification
Given training set with labels
Predictlabelfor a new instance that is not in the training set
20

Classification
21
Example: Credit scoring
Differentiating between
low-riskand high-risk
customers from their
incomeand savings
Discriminant:IF income> θ
1AND savings> θ
2
THENlow-risk ELSEhigh-risk

Classification: Applications
22
Aka Pattern recognition
Face recognition: Pose, lighting, occlusion (glasses, beard), make-up,
hair style
Character recognition: Different handwriting styles.
Speech recognition: Temporal dependency.
Medical diagnosis: From symptoms to illnesses
Biometrics: Recognition/authentication using physical and/or
behavioral characteristics: Face, iris, signature, etc
Outlier/novelty detection:

Face Recognition
23
Training examples of a person
Test images
ORL dataset,
AT&T Laboratories, Cambridge UK

Supervised Learning: Regression
Given training set with target numerical values
Predicttargetfor a new instance that is not in the training set
24

Regression
Example: Price of a used car
x : car attributes
y : price
y = g ( x |
θ )
g ( ) model,
θ parameters
25
y = wx+w
0

Regression Applications
26
Navigating a car: Angle of the steering
Kinematics of a robot arm
α
1= g
1(x,y)
α
2= g
2(x,y)
α
1
α
2
(x,y)
Response surface design

Supervised Learning: Uses
27
Prediction of future cases: Use the rule to predict the output for future
inputs
Knowledge extraction: The rule is easy to understand
Compression:The rule is simpler than the data it explains
Outlier detection: Exceptions that are not covered by the rule, e.g.,
fraud

Unsupervised Learning
28
Learning “what normally happens”
Clustering: Grouping similar instances
Example applications
Customer segmentation in CRM
Image compression: Color quantization
Bioinformatics: Learning motifs(nucleotide or amino-acid sequence patterns)

Unsupervised Learning: Clustering
29
Given training set with no labels
Group similar instances into clusters

Unsupervised Learning:
Anomaly Detection
30
Given training set with no labels
Assign new instance as either normal or anomaly

Unsupervised Learning:
Dimension Reduction
31
Given training set with high number of features (say images)
Output training set with lower number of features

Learning Associations
32
Basket analysis
Given training dataset containing baskets of products/services
Find
P (Y | X ) probability that somebody who buys Xalso buys Y where X
and Y are products/services.
Example: P ( chips | beer ) = 0.7

Reinforcement Learning
33
Learning a policy: A sequence of
outputs
No supervised output but delayed
reward
Credit assignment problem
Game playing
Robot in a maze
Multiple agents, partial
observability, ...

The data mining process
34

Data Mining Process
35
Time to
complete
Importance to
success
1. Exploring the problem
20% 80%2. Exploring the solution
3. Implementation specification
4. Data mining
80% 20%
a. Data preparation
b. Data surveying
c. Data modeling

Inductive Bias
36
Important decisions in learning systems:
Structure of the model (language)
Order to search the space of structures
Way that overfitting to the particular training data is avoided
Type of inductive bias:
Language bias
Search bias
Overfitting-avoidance bias

37
????????????=0.5

Linear Least Square Classification
38
Predictive Method: Linear regression
Find the best line f(x) that divides the space into positive and negative
regions

Linear Least Square Fit Bias
39
Language bias
The function f(x) is linear
Search bias
Analytical solution minimizing the error, sum of the squares
Overfitting-avoidance bias
Not needed. Language is too simple.

K Nearest Neighbor
40
Let the k nearest neighbor vote for classification

41

42
Low Bias,
High Variance

K Nearest Neighbor Bias
43
Language bias
Represent point by its k nearest neighbor
Search bias
Deterministic
Overfitting-avoidance bias
Adjust k using validation/dev data set

Model Selection Using Holdout Validation
44

45

Optimal Bayes Decision Boundary
46

Decision Boundaries
47

Generalization as search
48
Inductive learning: find a concept description that fits the data
Example: rule sets as description language
Enormous, but finite, search space
Simple solution:
enumerate the concept space
eliminate descriptions that do not fit examples
surviving descriptions contain target concept

Bias and Learning Example
49
ID Pump
Type
Pump
Size
Max
Load
Pump
Eff.
Class
1 A Large Low HighNormal
2 B Small High Low Failure
3 B Large High HighNormal
4 … … … … …
Attributes
ID is integer
Pump Type is {A, B}
Pump Size is {Large, Small}
Max Load is {High, Low}
Pump Eff. Is {High, Low}
Class is {Normal, Failure}
•Ignoring the ID and Class attributes, how many
distinct instances are possible?
The size of the instance space is = 2*2*2*2 = 16

Modeling Language: Hypothesis Space
50
Suppose for this problem, the four attributes (instance language) precisely capture
the features of the domain
Let the instance space ????????????
Instances <type, size, load, efficiency>
????????????
1=<A, Large, High, High>
????????????
2=<A, Large, High, Low>
????????????
3=<A, Large, Low, High>

????????????
15=<B, Small, Low, High>
????????????
16=<B, Small, Low, Low>
},,,,{
16321
iiiiI =

Power Set
51
Power set of a set S is set of all possible subset of S.
Power set of {a, b, c} is

Hypothesis Space
52
Let the modeling language be the power set of I
}},,,{,},,,{,},,,{},,,{
},,{,},,{},,{},{,},{},{{{},
2
1621161514421321
161531211621iiiiiiiiiiii
iiiiiiiii
H
I


==
},,{;655362
655361
16
hhHH ===
What is the size of modeling language (hypothesis space)?

Learning Algorithm
53
A hypothesis h is consistent with the training instance i,
if i is labeled normal, then h contains i
If i is labeled failure , then h does not contain i
Learning the model
Initially, let set of candidate C = H
Remove all hypothesis for C that is not consistent with the instances in the
training set
Classification: given an instance i, each hypothesis h in C votes
+1 if the hypothesis h is consistent with i
-1 otherwise

Example Training
54
Training set:
a = normal, c = failure
Candidate hypothesis consistent with training set must contain instance
a, and not instance c

Is Unbiased Learning Possible?
55
There are only 16 unique instances
Suppose the training set contains 15 instances (i
1, i
2, …, i
15), and they
are all labeled failure
What is the content of candidate set C ?
C = { {}, {i
16} }
What is the vote count for i
16
Zero

Summary
56
Machine learning
Analysis of often large amounts of data to find unsuspected patterns and to
summarize in novel ways
Machine learning process involves
Exploring the problem, exploring the solution, implementation specification, data
preparation, data surveying, data modeling
Machine learning task types
Association
Supervised Learning: Classification, Regression
Unsupervised Learning
Importance of inductive bias in data mining
Tags