Hierarchical Classification by Jurgen Van Gael

PyData 2,374 views 27 slides Apr 21, 2014
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Hierarchical Classification by Jurgen Van Gael


Slide Content

Hierarchical)Classification0
Jurgen Van Gael - .

About0
• Computer Scientist w/ background in ML.
• London Machine Learning Meetup.
• Founder of Math.NET numerical library.
• Previously @ Microsoft Research.
• Data science team lead at Rangespan.

Taxonomy)Classification0
• Input: raw product data
• Output: classification models, classified product data
ROOT0
Electronics0
Audio0
Audio)
Cables0
Amps0 …0
Computers0 …0
Clothing0
Pants0 T@Shirts0 …0
Toys0
Model)
Rockets0
…0
…0

Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labeling0

Feature)Extraction0

Name: INK-M50 Black Ink Cartridge (600 pages)
Manufacturer: Samsung
Description: null
Label: toner-inkjet-cartridges
"category": "toner-inkjet-cartridges”,
"features": ["cartridge", "samsung", "black", "ink", "ink-m50",
"pages”]
Feature)Extraction:0
• Text)cleaning)(stopword,)lexicalisation)0
• Unigram)+)Bigram)Features0
• LDA)Topic)Features0
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0

h"p://radimrehurek.com/gensim4

Training,)Testing)&)Labelling0

Hierarchical)Classification0
D0
A0 C0B0
E0
D0A0 C0 E0B0
4)(5))way)multiclass)classification0

Hierarchical)Classification0
D0
A0 C0B0
E0 D0
A0 C0B0
E0
2)+)3)way)multiclass)classification0

Naïve)Bayes))))))Neural)Network0
0
Logistic)Regression0
Support) Vector) Machines) …0?0

Logistic)Regression)@)Model0
word4 printer7
ink4
printer7hardware4
cartridge04.00 0.30
the0 0.00 0.00
samsung0 0.50 0.50
black0 0.50 0.30
printer'@1.00 2.00
ink0 5.00 @1.70
…0 …0 …0
For each class
For each feature
Add the weight
Exponentiate & Normalize
10.00Σ=4
@0.60
Pr=40.999970 0.00030
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0

Logistic)Regression)@)Inference0
• Optimise using Wapiti.
• Hyperparameter optimisation using grid search.
• Using development set to stop training?
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0

h"p://wapiti.limsi.fr/4

ROOT0
Electronics0 Clothing0
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0

Cross Validation Calibration
• Estimate classifier errors.
• DO NOT
o Test on training data.
o Leave data aside.
• Are my probability
estimates correct.
• Computation:
o Take x data points with p(.|x) =
0.9,
o Check that about 90% of labels
were correct.
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
Training)Data0
Error)=)1.2%0
Error)=)1.1%0
Error)=)1.2%0
Error)=)1.2%0
Error)=)1.3%0
=0
Error)=)1.2%0

Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
ROOT0
Electronics0 Clothing0
Using)Bayes)rule)to)chain)classifiers:0

Active)Learning0

ROOT0
Electronics0 Clothing0
p(electronics|{text}))=)0.10
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0

• High probability
datapoints
o Upload to production
• Low probability
datapoints
o Subsample
o Acquire more labels
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
ROOT0
Electronics0 Clothing0
p(electronics|{text}))=)0.10
e.g.)Mechanical)Turk0

Implementation0

Implementation0
MongoDB0 S3)Raw0 S3)Training)Data0S3)Models0
1.)JSON)export02.)Feature)Extraction03.)Training0 4.)Classification0

Training)
MapReduce0
• Dumbo on Hadoop
• 2000 classifiers
• 5 fold CV (+ full)
• 20 hypers on grid
= 200.000 training runs

Labelling0
• 128 chunks
• Full Cascade each
chunk
D
A CB
E
Chunk)
10
Chunk)
20
Chunk)
30
Chunk)
N0…0
D
A CB
ED
A CB
ED
A CB
E

Thoughts0
• Extra’s:
o Partial labeling: stop when probability
becomes low.
o Data ensemble learning.
• Most time spent feature engineering.
• Tie the parameters of the classifiers?
o Frustratingly easy domain adaptation, Hal
Daume III
• Partially flattening the hierarchy for
training?