About0
• Computer Scientist w/ background in ML.
• London Machine Learning Meetup.
• Founder of Math.NET numerical library.
• Previously @ Microsoft Research.
• Data science team lead at Rangespan.
Logistic)Regression)@)Model0
word4 printer7
ink4
printer7hardware4
cartridge04.00 0.30
the0 0.00 0.00
samsung0 0.50 0.50
black0 0.50 0.30
printer'@1.00 2.00
ink0 5.00 @1.70
…0 …0 …0
For each class
For each feature
Add the weight
Exponentiate & Normalize
10.00Σ=4
@0.60
Pr=40.999970 0.00030
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
Logistic)Regression)@)Inference0
• Optimise using Wapiti.
• Hyperparameter optimisation using grid search.
• Using development set to stop training?
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
Cross Validation Calibration
• Estimate classifier errors.
• DO NOT
o Test on training data.
o Leave data aside.
• Are my probability
estimates correct.
• Computation:
o Take x data points with p(.|x) =
0.9,
o Check that about 90% of labels
were correct.
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
Training)Data0
Error)=)1.2%0
Error)=)1.1%0
Error)=)1.2%0
Error)=)1.2%0
Error)=)1.3%0
=0
Error)=)1.2%0
• High probability
datapoints
o Upload to production
• Low probability
datapoints
o Subsample
o Acquire more labels
Data)
Collection0
Feature)
Extraction0
Training0Testing0
Labelling0
ROOT0
Electronics0 Clothing0
p(electronics|{text}))=)0.10
e.g.)Mechanical)Turk0
Training)
MapReduce0
• Dumbo on Hadoop
• 2000 classifiers
• 5 fold CV (+ full)
• 20 hypers on grid
= 200.000 training runs
Labelling0
• 128 chunks
• Full Cascade each
chunk
D
A CB
E
Chunk)
10
Chunk)
20
Chunk)
30
Chunk)
N0…0
D
A CB
ED
A CB
ED
A CB
E
Thoughts0
• Extra’s:
o Partial labeling: stop when probability
becomes low.
o Data ensemble learning.
• Most time spent feature engineering.
• Tie the parameters of the classifiers?
o Frustratingly easy domain adaptation, Hal
Daume III
• Partially flattening the hierarchy for
training?