A Friendly Introduction to Machine Learning

hellohaptik 1,471 views 35 slides May 11, 2018
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

In this Lunch & Learn session, Chirag Jain gives us a friendly & gentle introduction to Machine Learning & walks through High-Level Learning frameworks using Linear Classifiers.


Slide Content

Introduction to Machine Learning Chirag Jain , ML Engineer

About Hapti k Chatbot platform for publishers, advertisers and enterprises AI powered conversational interface to drive customer engagement Reach of 30 Million Users , processing 5 Million Chats per month One of the world’s largest chatbot platforms Started in 2013 , global pioneers of chatbots

How this talk is divided Part 1: AI Introduction and applications Introduction New and Old news about AI Part 2: ML Introduction and workflow Introduction Part 3: High Level Learning framework Code (and some Math) walkthrough of linear classifier

What is AI ? Demonstration of human like intelligence by machines. A machine performing any task that needs human level intelligence can be said to be “Artificially Intelligent”

AI In everyday life today Email Categorization Web search Targeted (annoying) Ads

AI In everyday life today Maps & Navigation Computer Games Digital Assistants

Few ML success stories in the past 3 years

Few ML success stories in the past 3 years

Few ML success stories in the past 3 years Neural Style Transfer Controllable Image Generation ( Xianxu Hou et. al.)

Major Goals of AI Reasoning and Problem Solving Knowledge Representation Autonomy and Planning Self Learning via Experiences ← Machine Learning is a part of this Natural language processing Sensory Perception

Major Goals of AI Motion and Manipulation Social Intelligence General/Super Intelligence ← Media tries to sell you this

Sciences involved in AI research Computer Science Mathematics Psychology Linguistics Philosophy Many Others

Philosophy around AI Is general/super intelligence possible ? Do they have to be similar to human systems to be intelligent as us ? Can intelligent machines be dangerous ? Should we prefer more accurate systems over transparent systems ?

The vagueness and the hype Real Story: Task was to learn negotiation in natural language, not some efficient cryptic language. Researchers only reported a failed experiment trail.

The vagueness and the hype “ Artificial Intelligence  - The Revolution Hasn’t Happened Yet ” - Michael I. Jordan [1] Artificial Intelligence - the “high-level” or “cognitive” capability of humans to “reason” and to “think.” Intelligence Augmentation - services that augment human intelligence and creativity. Intelligent Infrastructure - a web of computation, data and physical entities that make human environments more supportive and safe. “ The impossibility of intelligence explosion ” - François Chollet [2] [1] https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7 [2] https://medium.com/@francois.chollet/the-impossibility-of-intelligence-explosion-5be4a9eda6ec

Few words on “Deep learning” Image Credits: http://neuralnetworksanddeeplearning.com/chap5.html

AI, ML, NN, DL are not new! First Programmable Computer ≈ 1936 AI research began ≈ 1956 Neural Networks - base ideas as early as 1943, polished idea ≈ 1958, research active since 1990s Deep learning - first idea proposed in 1965 , early implementations ≈ 1965 - 1971, research active since 1990s Large NNs were computationally infeasible to train back then NNs and DL went into “hibernation” for more than a decade

Resurgence of “AI” because of Deep Learning Training complex models has become feasible now Large datasets are available for some tasks Compute power has increased exponentially - we now have very powerful GPUs/TPUs Theoretical ideas in research have been polished over time Much better tools to work with! Theano,Tensorflow (Google), Keras (now Google), Torch/PyTorch(Facebook), CNTK (Microsoft), Caffe (UCB), MXNet(Apache, Amazon), sklearn, gensim, nltk

Resurgence of “AI” because of Deep Learning DL model winning in Large Scale Image Recognition Competition in 2012 by huge margins [1] proved as an inflection point. DL has continuously produced promising results in other areas. [1] ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et. al.)

Machine Learning Image credits: https://recast.ai/blog/machine-learning-algorithms/

Machine Learning Blends ideas from statistics, computer science, operations research, pattern recognition, information theory, control theory and many other disciplines to design algorithms that find low-level patterns in data, make predictions and help make decisions (at scale).

Typical Machine Learning Pipeline

Common Taxonomy of ML methods Supervised Learning - some feedback is available Completely Supervised Learning Semi-Supervised Learning Active learning Reinforcement Learning Unsupervised Learning - no explicit ground truths Meta learning ...

Common Tasks for ML Classification (usually supervised) Regression (usually supervised) Clustering (unsupervised) Dimensionality Reduction ...

Classification Task is to learn to categorize input into discrete classes E.g. Input: Image Output: probabilities of image containing {dog, cat, horse, zebra} Supervised task, we have true labels for each input Metrics: To keep things simple, we will use accuracy - how many things the classifier can classify correctly. Selecting a metric depends on the data + problem

Logistic Regression - A simple linear classifier Notebook to follow along https://gist.github.com/chiragjn/24b548785d99a393fca9dccfe1439d4a

Gradient Descent Your loss function may look something like this

Gradient Descent But let’s take a simpler example

Gradient Descent Optimum value is at the bottom

Gradient Descent You spawn at some random point

Gradient Descent Gradient at any point points in direction of steepest change

Gradient Descent Learning rate is the scaling factor of the gradient step i.e. how much to nudge each variable involved

Gradient Descent Keep learning rate small, Take smaller steps

Gradient Descent Large learning rate can cause overshoots

Other things that we don’t have time for Non-Linear classifiers Learning Methods that don’t use Gradient Descent Other Metrics: Precision, Recall, F1 Overfitting and underfitting And many more tricks of the trade