Reinforcement Learning - Learning from Experience like a Human
PAWDeutschland
413 views
46 slides
Nov 29, 2018
Slide 1 of 46
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
About This Presentation
Reinforcement learning is in addition to (un)-supervised learning a major machine learning technology, which has a huge potential in a broad field of applications like robotics, autonomous driving, gaming and general control. This talk describes the major concepts, algorithms and software environmen...
Reinforcement learning is in addition to (un)-supervised learning a major machine learning technology, which has a huge potential in a broad field of applications like robotics, autonomous driving, gaming and general control. This talk describes the major concepts, algorithms and software environments of it and gives a detailed overview of its capabilities. It addresses people with a reasonable background of other AI/ML technologies and therefore requires a good technical background.
Size: 17.99 MB
Language: en
Added: Nov 29, 2018
Slides: 46 pages
Slide Content
Reinforcement Learning
Learning from experience like a human …
Nokia Bell Labs / Norbert Kraft
Introduction Nokia Bell LabsAccess
End to End Network & Service Automation
Application Platforms & Software Systems
Standardization
Smart Network Fabric
Algorithms, Analytics & Augmented Intelligence
Emerging Materials, Components and DevicesResearch Activities
Analyze
Taxonomy of Machine Learning
From Analysis to Full Autonomous Control
Descriptive
Analytics
What happened?
Predict
Predictive
Analytics
What will happen?
Control
Prescriptive
Analytics
Make it happen!
Difficulty
Value
Control
Predict
Analyze
Monitoring
Sensing
11/12/183 Reinforcement Learning -learning from experience like a human
Don’t analyze, why you failed –just do it right ….
Reinforcement Learning -learning from experience like a human11/12/184
Machine Learning
Unsupervised
learning
Find
anomalies,
similarities,
groups
Reduce
complexity
of high
dimensional
features
Supervised
learning
Learn from
labelled
observations
Reinforcement
learning
Learn
from
experience
Reinforcement Learning -learning from experience like a human
Basic Ideas …
Find groups with
similar attributes,
which are not
necessarily self-
explaining …
Generate a limited
set of new features
with virtual meaning
…
Train human
knowledge by taking
a labelled list
(… not always existing)
11/12/185
Learn system
behavior with
experiments
(learning by doing)
Machine Learning
Basic Ideas …
11/12/18 Reinforcement Learning -learning from experience like a human6
Find groups with
similar attributes,
which are not
necessarily self-
explaining …
Generate a limited
set of new features
with virtual meaning
…
Train human
knowledge by taking
a labelled list … not
always existing
Supervised LearningInUnsupervised
Learning
In OutSupervised
Learning
In OutReinforcement
Learning
Target
Error
In Out
Reward
Learning from
experience with
reward by
trial & error
Machine Learning
Deep Learning
Unsupervised
learning
Find
anomalies,
similarities,
groups
Reduce complexity
of high dimensional
features
Supervised
learning
Learn from labelled
observations
Reinforcement
learning
Learn
from
experience
Reinforcement Learning -learning from experience like a human
Machine Learning Concepts & Deep Learning
11/12/187
How does this map to a human brain?
11/12/18 Reinforcement Learning -learning from experience like a human8
Basal Ganglia
Reinforcement
Learning
Reward≈Dopamine
Cerebral Cortex
Unsupervised
Learning
Cerebellum
Supervised
Learning
Machine Learning
Reinforcement Learning -learning from experience like a human
One Way of Human Thinking/Learning
Observations Actions
Reward
Find action
to optimize reward …
11/12/189
The Human Way of Thinking/Learning
Reinforcement Learning -learning from experience like a human
Learning is Trial and Error … again and again???
11/12/1810
Differences between Human Brain and a Neural Networks
11/12/18 Reinforcement Learning -learning from experience like a human11
CharacteristicHuman BrainNeural Network
Feed forwardYes Yes
Feed backwardYes Yes (only RNNs …)
Complexity1011 neurons109 transistors
Switching speed10-3 secs. 10-9 secs.
StructureHierarchicalFlat & simplistic
OperationMassively parallelStill serial & parallel
How about ??
•Intuition
•Instinct
•Gut feeling
•Mind
•Intellect
Most People have an idea of a dangerous animals
without learning it …
11/12/18 Reinforcement Learning -learning from experience like a human12
Reinforcement Learning
Robotics
Control
physical
systems
GamesOptimizationGeneral
Compute
Problems
Router/Radio
Channel
Assignment
Power
optimization
Scheduling
algorithms
Admission
Control
Anomaly
detection
Reinforcement Learning -learning from experience like a human
Application Areas
11/12/1813
Reinforcement Learning
Reinforcement Learning -learning from experience like a human
Universal Self Learning with Autonomous Algorithms
Universal
Autonomous
Self-learning
Algorithms
11/12/1814
Reinforcement LearningNo pre-define knowledge
Starts with random action
Trial & error learning
Find solution with optimum reward
Agent/environment states are hidden
Controller receives observations, reward and
triggers action
System receives action, goes into next state,
generates observations and reward
Reinforcement Learning -learning from experience like a human
Components & Interaction
Controller
(agent)
System
(environment)
observations
actionsreward
11/12/1815
How does this translate to real environments
•Human, Neural Net, Decision Tree, Coded
AlgorithmAgent
•Robot, Machine, Chess / Go game,
Telecommunication network, ProblemEnvironment
•Go left/right, stop, move pawn to, set parameter
value toActions
•Car Position/Speed, (chess) piece
positions/value, temperature, performance KPIObservations
•Power consumption, no/value of (chess) pieces,
game score, call success rateReward
Reinforcement Learning -learning from experience like a human
Examples: Robots, Games, Telecom
Controller
(agent)
System
(environment)
observations
actionsreward
11/12/1816
System (environment)
Reinforcement Learning
Reinforcement Learning -learning from experience like a human
Examples: Telecom
Controller (agent)
KPIs
Change Parameter
HighLevelKPIsCEIPower
11/12/1817
Some theory …
11/12/18 Reinforcement Learning -learning from experience
like a human
18
Markov Properties
Fully observable process
•The current state completely characterizes the process
The future is independent of the past given
the present
The state captures all relevant information
from the history
Once the state is known, the history may be
thrown away
Reinforcement Learning -learning from experience like a human11/12/1819
Reinforcement Learning
Observability
•Full: Sa
t = Se
t
•Partial: Sa
t ≠ Se
t
Agent functions
•Policy (predicted action based on state)
•a = !(#)
•Value (pred. of future reward)
•vπ(s,a)=+![-.+1,- .+2 …]
•Model
•Build transition model of the system
Reinforcement Learning -learning from experience like a human
Formalisms
Controller
(agent)
System
(environment)
OtRtAt
Sa
t
Se
t
Value
Policy Model
11/12/1820
Agent Functions are Optional
Value
PolicyModel
•No Policy (Implicit)
•Value function
Value
Based
•Policy function
•No value function
Policy
Based
•Policy function
•Value function
Actor
Critic
•Policy and/or value function
•No model
Model
Free
•Policy and/or Value Function
•Model
Model
Based
Reinforcement Learning -learning from experience like a human
Agent Types
11/12/1821
Exploitation
Reinforcement Learning
•Find more information about the
environment …
•Try random action
•Use action not used before in this state
•…
Exploration
•Exploit already known information to
maximize reward …
•Use action promising most direct reward
•Use action promising most future reward
•…
Exploitation
Reinforcement Learning -learning from experience like a human
Exploration vs. Exploitation
Exploration
Optimal
solution
No convergence
Sub optimum
Exploration-Exploitation Dilemma
11/12/1822
Reinforcement Learning
Use certain amount of random actions
•1 −6780,1> ∈-> <∗#,7
•1 −6780,1≤ ∈-> 6787
Decrease ∈over time
•∈?@A= B ∗ ∈?
11/12/18 Reinforcement Learning -learning from experience like a human23
Exploration/Exploitation Strategies : (dynamic) ∈-Greedy
Reinforcement Learning & Neural Nets
•Ot1 = f(xt1, yt1,zt1 …) C,D,E ∈F
•At1 = Map(xt1, yt1,zt1 …)
Limited
observation
space
•Ot1 = f(xt1, yt1,zt1 …) C,D,E ∈-
•At1 = model.predict(xt1, yt1,zt1 …)
Unlimited
observation
space
Reinforcement Learning -learning from experience like a human
Limited vs. Unlimited spaces for Observations, Actions
Algorithms / tables
Policy / Mapping
Supervised Learning
Decision Trees
Lin/log Regression
Supervised Learning
Neural Networks
Deep Learning
11/12/1824
Reinforcement Learning
•Discounted Reward: GT
•Discount factor G∈ 0,1
-G≈1‘far-sighted’ evaluation
-G≈0‘myopic’ evaluation
Reinforcement Learning -learning from experience like a human
Discounted Future Reward
11/12/1825
Reinforcement Learning
Result of Q-Function represents actual & future reward
•Based on current state(s) and action(a) applied
•Corrected by maximal achievable award in state #?@A
Learning is done by continuous update of Q
•Ilearning rate (adoption rate for learned knowledge)
•Gdiscount factor for future reward
Reinforcement Learning -learning from experience like a human
Temporal-Difference Learning: Q Learning
<#?, 7?=J7C-?@K
<#?, 7?=1 − I <#?, 7?+ I(6?+GJ7C<(#?@A,7?@A))
11/12/1826
Reinforcement Learning
Neural networks for large observation & actions spaces
•Can work with pixel based observations
•Large amount of setup values (actions)
Different variants
•Onefeedforwardper s,acombination<(#,7)
•Onefeed forward per state <#
Reinforcement Learning -learning from experience like a human
Deep Q networks
11/12/1827
State
Action
Neural NetQ ValueStateNeural Net
Q
Value(a1)
Q
Value(an)
…or
1. 2.
Reinforcement Learning
Given Transition <#,7,6,#M>
1.Do feed forward for all actions in state s
2.Get max. Q value for all actions in state s’
3.Set target value for <#,7=6+ G J7C<(#M,7M)
4.Update weights using back propagation
11/12/18 Reinforcement Learning -learning from experience like a human28
Deep Q networks Update Rules
Reinforcement Learning
Small changes in Q-value could cause a totally different action
selection.
No convergence guarantee.
Tries to find deterministic value function, some problems require
a stochastic value function
Reinforcement Learning -learning from experience like a human
Q Learning Pre-requisites / Limitations
11/12/1829
Some examples …
11/12/18 Reinforcement Learning -learning from experience
like a human
30
OpenAI
Ready to use environments for agent &
algorithm development
•Computing problems
•Games
•Robots
•2D Problems
Find the optimal model/policy/value function
for a problem
•Model based (unlimitedaction, observation, reward space)
•Value/Policy based (limitedaction, observation, reward
space)
Functions:
•Ot0 = reset()
•Ot1,Rt1, Se
t = step(At0)
Reinforcement Learning -learning from experience like a human
An Agent Development Environment
Controller
(agent)
OpenAI
System
(environment)
OtRtAt
Sa
t
Se
t
Value
Policy Model
11/12/1831
OpenAI
Reinforcement Learning -learning from experience like a human
Environments for Advanced Algorithm Development
AcrobotCartPoleCar over mountainPendulum
Humanoid stand upTennisCar RaceLunar Lander
Humanoid
Robot
11/12/1832
Computational alg.
Reinforcement LearningBalance inverted pendulum
•Simplified for 1 dimension
State
•Cart position [-2.4, 2.4]
•Cart Velocity [-inf, inf]
•Pole Angle [-41°, 41°]
•Pole velocity at tip [-inf, inf]
Actions
•Impacts cart direction & velocity
•Push cart to left
•Push cart to right
Termination
•Cart position at boundary (fails)
•Angle outside [-12, 12] (fails)
•More than 200 steps (terminates successfully)
Reward
•+1 for every step not terminating
Reinforcement Learning -learning from experience like a human
Example: CartPole
By using random actions pole returns to stable state
11/12/1833
Reinforcement Learning
Example: CartPoleSolving with model based algorithm (RandomForest)
11/12/18 Reinforcement Learning -learning from experience like a human34
Reinforcement Learning
Example: CartPoleSolving with model based algorithm (Neural Network)
11/12/18 Reinforcement Learning -learning from experience like a human35
Reinforcement Learning
Copy characters from observation tape to output tape
•Various character sets [A..[
•Different string length increasing during different runs
State
•Character observed at read head
Actions
•Move read head left or right
•Copy character to output tape or not
Termination
•Wrong character written (fails)
•Timeout after some amount of unsuccessful trials (fails)
•All characters written to output tape (terminates successfully)
Reward
•+1 for correct character written
•-0.5 for wrong character written
•0 for plain head movements
Reinforcement Learning -learning from experience like a human
Example: Copy (Algorithm environment)
11/12/1836
Reinforcement Learning
Example: Copy (Algorithm environment) -Solved with discrete Q-Learning
11/12/18 Reinforcement Learning -learning from experience like a human37
Successfully learned to copy
strings of random length and
content
Reinforcement LearningUnder powered car to go across a hill
•You have to go backward to get enough swing
State
•Position on x axis
•Speed
Actions
•Push forward
•Push backward
•Do nothing
Termination
•Time after 200 steps
•Car reaches the flag on the hill
Reward
•-1 for every step
•+0.5 for right push & speed > 0
•+0.5 for left push & speed < 0
Reinforcement Learning -learning from experience like a human
Example: Mountain Car
11/12/1838
Reinforcement Learning
11/12/18 Reinforcement Learning -learning from experience like a human39
Mountain Car Videos
Random WalkTraining phase
Reinforcement LearningLand space ship on the moon
•Land in landing zone
•Surface and start condition change
State
•8 real values (position, angle, speed…)
Actions
•Fire main engine
•Fire left/right engine
Termination
•Move to landing pad with zero speed
•Can also land outside landing pad
Reward
•Firing main engine -0.3 (unlimited fuel)
•Ground contact +10
•Landing in pad 100-140
Reinforcement Learning -learning from experience like a human
Example: Lunar Lander
11/12/1840
Reinforcement Learning
11/12/18 Reinforcement Learning -learning from experience like a human41
Example: Lunar Lander
Random WalkTraining phase
Reinforcement Learning
Example: Lunar Lander
11/12/18 Reinforcement Learning -learning from experience like a human42
Some final words …
11/12/18 Reinforcement Learning -learning from experience
like a human
43
Reinforcement Learning
•Direct reward gives insufficient feed back on success strategyDelayed Rewards
•e.g. states using pictures & complex sensors
•Requires deep learning
Continuous/Large
observations states
•Slows down solution convergence
•Find (sub-)optimal solution
Exploration/exploitation
strategies
•Only a mix of policy, value, model and q-function solves most problems
•Standard supervised algorithms do not solve the problemMeta solution strategies
Reinforcement Learning -learning from experience like a human
Problems & Research Areas
11/12/1844
Reinforcement Learning
11/12/18 Reinforcement Learning -learning from experience like a human45
Takeaways …