Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

MarkoLohert 60 views 42 slides May 16, 2024
Slide 1
Slide 1 of 42
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42

About This Presentation

Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.

This is the presentation about reinforcement learning that I gave at DORS/CLUC conference in 2024 in Zagreb, Croatia (https://www.dorscluc.org).


Slide Content

Agenda What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL ? How to get started?

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

What is reinforcement learning? Reward Action

What is reinforcement learning? Reward Action

What is reinforcement learning? action-reward feedback loop of a generic RL model

RL model Learning is done in steps An Agent learns by interacting with its Environment Based on the state of the Environment and rewards an Agent decides which action to take S t == state at step t A t == action at step t An Agent receives an reward (positive or negative) R t == reward at step t A collection of possible rewards can be a (small) group of number (example: -1, 0, 1) Based on Agent’s action the Environment generates reward R t and a new state S t

What is reinforcement learning? Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.

What is reinforcement learning? A branch of machine learning A way of learning that mimics the fundamental way in which humans and animals learn Artificial Intelligence that learns by trial-and-error, and receives an award or penalty for actions it performs. The goal is to maximise the total reward. AI that is trained on real-life scenarios

Policy How does Agent decide which action to take? Policy determines a probability that Agent will do Action A t when in State S t Policy: π(a|s)

Goal == maximize total reward 𝜸 == discount factor Determines how much is a reward in distant future is less important that reward in near future Gt (Return) total reward in the future Learning is done in discrete steps R k == reward in step k The number of steps can be fixed (T) or infinite (∞)

Reinforcement learning in the the world of AI Artificial Intelligence Machine Learning … … Supervised learning Unsupervised learning Reinforcement learning

R einforcement learning in the the world of ML S upervised learning vs reinforcement learning Supervised learning relies on labeled data set Unsupervised learning vs reinforcement learning Unsupervised learning == training based on unlabeled data == finding patterns in data Reinforcement learning == learning through the mechanism of rewards and punishments

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

Robotics RL is used for building robust robots Industrial robots for more complex applications Sophisticated grasping strategies, object manipulation techniques, and enhance hand-eye coordination RL can be used to teach a robot to walk on 2 or 4 legs

RL can be used to teach a robot to walk on two/four legs https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting-on-the-right-foot-with-reinforcement-learning https://youtu.be/goxCjGPQH7U

Gaming RL can be used for testing games RL can perform many iterations without human input

Reinforcement learning and Atari games Deep Q L earning was used to teach AI how to play Atari 2600 games

Reinforcement learning and Atari games AI system did not get a domain knowledge how to play games (rules) System only sees pixels and was instructed to maximize points Implemented for many Atari 2600 games: Pong, Breakout … In 2013. Deepmind has published „Playing Atari with Deep Reinforcement Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Reinforcement learning and Atari games Game: Breakout After 240 minutes RL system has learned the best strategy: Create a tunnel, and send ball above the blocks -> The ball bounces between roof and blocks

„The implications go far beyond my beloved chessboard... Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.” Garry Kasparov f ormer world chess champion

AlphaGo P resented in 2015. by Google DeepMind (https://deepmind.google) The first program that won a match against world champion in Go Chinese strategy board game Bigger challenge than chess

AlphaZero 2017 AlphaZero == a single AI system that is an expert in: Go Chess Shogi ( Japanese chess) https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-chess-shogi-and-go

H ealthcare Reinforcement learning is applied to: Development of the new drugs Diagnostics Dynamic t reatment r egimes (DTRs) Surgery …

T rading and Finance Reinforcement learning achieves better results than supervised learning when applied to trading and finance IBM has developed a sophisticated RL-based platform that has ability to make financial trades

A utonomous driving RL can be used for: Trajectory optimization Avoiding collision Lane changing Automatic parking …

More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I

And other areas … Cooling of data center (Google has reduced energy usage by 40%) News recommendation Marketing …

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

Advantages of Reinforcement Learning ✅RL can solve complex problems that cannot be solved using other methods. ✅It functions in dynamic environments ✅RL does not need a separate step of preparing data Difference between RL and supervised learning ✅It can be used when the only way to collect data from an environment is for an agent to interact with that environment …

Disadvantages of Reinforcement Learning ⚠ Sparse-reward environment - an agent receives a reward only when the goal is reached Harder to known which steps were actually useful Popular solution == reward shaping -> adding additional hand-crafted rewards to help RL Hand-crafted additional awards require human expert to design them correctly, and additionally humans can be bias

Disadvantages of Reinforcement Learning ⚠ RL needs to collect a lot of data from environment, and it needs a lot of calculations (data hungry) Not a problem when RL is applied to gaming because it can play the same game many times and collect a lot of data. ⚠ It can be expensive to learn by trying (and failing) For example: in robotics where robots are expensive and can get damaged when used (for learning)

Solution to the disadvantages - general advice Combine RL with other techniques For example: RL + Deep Learning

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

RL Algorithms Source: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html

Q-Learning Algorithm Most famous RL algorithm “Q” in “Q-Learning” stands for quality Example (Python): https://www.datacamp.com/tutorial/introduction-q-learning-beginner-tutorial

Q-Table Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

Q-Learning Algorithm Source: https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html

Deep Q-Learning Algorithm Deep neural network instead of „simple” Q-Table Used in case of large environments Example (Python): https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

Deep Q-Learning Algorithm Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

API for reinforcement learning Python One Agent is used Different environments https://gymnasium.farama.org

Key points Reinforcement learning is a branch of machine learning where agent learns about its environment using the mechanism of rewards and punishments. RL doesn’t rely on labeled data set. RL l earns by trial-and-error through interacting with its environment so it can come to conclusions / knowledge that humans didn’t reach.