Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

MarkoLohert 60 views 42 slides May 16, 2024

Slide 1 of 42

About This Presentation

Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.

This is the presentation about reinforcement learning that I gave at DORS/CLUC conference in 2024 in Zagreb, Croatia (https://www.dorscluc.org).

Size: 9.52 MB

Language: en

Added: May 16, 2024

Slides: 42 pages

Slide Content

Agenda What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL ? How to get started?

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

What is reinforcement learning? Reward Action

What is reinforcement learning? action-reward feedback loop of a generic RL model

RL model Learning is done in steps An Agent learns by interacting with its Environment Based on the state of the Environment and rewards an Agent decides which action to take S t == state at step t A t == action at step t An Agent receives an reward (positive or negative) R t == reward at step t A collection of possible rewards can be a (small) group of number (example: -1, 0, 1) Based on Agent’s action the Environment generates reward R t and a new state S t

What is reinforcement learning? Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.

What is reinforcement learning? A branch of machine learning A way of learning that mimics the fundamental way in which humans and animals learn Artificial Intelligence that learns by trial-and-error, and receives an award or penalty for actions it performs. The goal is to maximise the total reward. AI that is trained on real-life scenarios

Policy How does Agent decide which action to take? Policy determines a probability that Agent will do Action A t when in State S t Policy: π(a|s)

Goal == maximize total reward 𝜸 == discount factor Determines how much is a reward in distant future is less important that reward in near future Gt (Return) total reward in the future Learning is done in discrete steps R k == reward in step k The number of steps can be fixed (T) or infinite (∞)

Reinforcement learning in the the world of AI Artificial Intelligence Machine Learning … … Supervised learning Unsupervised learning Reinforcement learning

R einforcement learning in the the world of ML S upervised learning vs reinforcement learning Supervised learning relies on labeled data set Unsupervised learning vs reinforcement learning Unsupervised learning == training based on unlabeled data == finding patterns in data Reinforcement learning == learning through the mechanism of rewards and punishments

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

Robotics RL is used for building robust robots Industrial robots for more complex applications Sophisticated grasping strategies, object manipulation techniques, and enhance hand-eye coordination RL can be used to teach a robot to walk on 2 or 4 legs

RL can be used to teach a robot to walk on two/four legs https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting-on-the-right-foot-with-reinforcement-learning https://youtu.be/goxCjGPQH7U

Gaming RL can be used for testing games RL can perform many iterations without human input

Reinforcement learning and Atari games Deep Q L earning was used to teach AI how to play Atari 2600 games

Reinforcement learning and Atari games AI system did not get a domain knowledge how to play games (rules) System only sees pixels and was instructed to maximize points Implemented for many Atari 2600 games: Pong, Breakout … In 2013. Deepmind has published „Playing Atari with Deep Reinforcement Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Reinforcement learning and Atari games Game: Breakout After 240 minutes RL system has learned the best strategy: Create a tunnel, and send ball above the blocks -> The ball bounces between roof and blocks

„The implications go far beyond my beloved chessboard... Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.” Garry Kasparov f ormer world chess champion

AlphaGo P resented in 2015. by Google DeepMind (https://deepmind.google) The first program that won a match against world champion in Go Chinese strategy board game Bigger challenge than chess

AlphaZero 2017 AlphaZero == a single AI system that is an expert in: Go Chess Shogi ( Japanese chess) https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-chess-shogi-and-go

H ealthcare Reinforcement learning is applied to: Development of the new drugs Diagnostics Dynamic t reatment r egimes (DTRs) Surgery …

T rading and Finance Reinforcement learning achieves better results than supervised learning when applied to trading and finance IBM has developed a sophisticated RL-based platform that has ability to make financial trades

A utonomous driving RL can be used for: Trajectory optimization Avoiding collision Lane changing Automatic parking …

More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I

And other areas … Cooling of data center (Google has reduced energy usage by 40%) News recommendation Marketing …

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

Advantages of Reinforcement Learning ✅RL can solve complex problems that cannot be solved using other methods. ✅It functions in dynamic environments ✅RL does not need a separate step of preparing data Difference between RL and supervised learning ✅It can be used when the only way to collect data from an environment is for an agent to interact with that environment …

Disadvantages of Reinforcement Learning ⚠ Sparse-reward environment - an agent receives a reward only when the goal is reached Harder to known which steps were actually useful Popular solution == reward shaping -> adding additional hand-crafted rewards to help RL Hand-crafted additional awards require human expert to design them correctly, and additionally humans can be bias

Disadvantages of Reinforcement Learning ⚠ RL needs to collect a lot of data from environment, and it needs a lot of calculations (data hungry) Not a problem when RL is applied to gaming because it can play the same game many times and collect a lot of data. ⚠ It can be expensive to learn by trying (and failing) For example: in robotics where robots are expensive and can get damaged when used (for learning)

Solution to the disadvantages - general advice Combine RL with other techniques For example: RL + Deep Learning

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

RL Algorithms Source: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html

Q-Learning Algorithm Most famous RL algorithm “Q” in “Q-Learning” stands for quality Example (Python): https://www.datacamp.com/tutorial/introduction-q-learning-beginner-tutorial

Q-Table Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

Q-Learning Algorithm Source: https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html

Deep Q-Learning Algorithm Deep neural network instead of „simple” Q-Table Used in case of large environments Example (Python): https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

Deep Q-Learning Algorithm Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?

API for reinforcement learning Python One Agent is used Different environments https://gymnasium.farama.org

Key points Reinforcement learning is a branch of machine learning where agent learns about its environment using the mechanism of rewards and punishments. RL doesn’t rely on labeled data set. RL l earns by trial-and-error through interacting with its environment so it can come to conclusions / knowledge that humans didn’t reach.

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd