OLIVAW: reaching superhuman strength at Othello

MeetupDataScienceRoma 415 views 72 slides Feb 21, 2019
Slide 1
Slide 1 of 72
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72

About This Presentation

https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/258491250/


Slide Content

OLIVAW: reaching superhuman strength at Othello From my Master thesis Deep learning for Othello Computer Science, Sapienza University of Rome Thesis Advisor: Prof. Alessandro Panconesi 19-02-19 Antonio Norelli using the AlphaGo Zero paradigm with Zero budget

Why games? Shannon 1949 Turing 1950 McCarthy 1956

A satisfactory solution of [chess] will act as a wedge in attacking other problems of a similar nature and of greater significance . Programming a Computer for Playing Chess Philosophical Magazine , Vol 41, No. 314 Shannon 1949

Games as a micro world Clear objectives Small set of rules Still interesting complexity

Choosing a move Naïf approach

Games as trees

Games as trees

Games as trees

Looking into the future Tic-tac- toe Connect 4 Checkers Othello Chess Go

Looking into the future Tic-tac- toe Connect 4 Checkers Othello Chess Go

Exloring the full tree is impossible  

Exloring the full tree is impossible  

What the future holds for us?

What the future holds for us?

Oracle to evaluate intermediate positions

Oracle to evaluate intermediate positions -0,42

Oracle using game knowledge

DeepBlue 1997

AlphaGo 2016

AlphaGo Zero 2017

AlphaGo Zero 2017

The AlphaGo Zero paradigm: is it universal?

The AlphaGo Zero paradigm: is it universal? Does it scale DOWN in terms of resources?

The AlphaGo Zero paradigm: is it universal? Does it scale DOWN in terms of resources? This thesis: Can we reach superhuman strength at Othello with the same paradigm and "normal" computing power?

AlphaGo Zero My solution 5000 TPUs ≈1 40 (3) Days of training 30 $ 25 Million Estimated Hardware cost $ 500 Budget

AlphaGo Zero My solution 5000 TPUs ≈1 40 (3) Days of training 30 $ 25 Million Estimated Hardware cost $ 500 Budget

AlphaGo Zero My solution 5000 TPUs ≈1 40 (3) Days of training 30 $ 25 Million Estimated Hardware cost $ 500

AlphaGo Zero My solution 5000 TPUs ≈1 40 (3) Days of training 30 $ 25 Million Estimated Hardware cost $ 500 Budget

AlphaGo Zero My solution 5000 TPUs ≈1 40 (3) Days of training 30 $ 25 Million Estimated Hardware cost $ 500 Budget

Why Othello ? Simple but still interesting

Why Othello ? Simple but still interesting

Why Othello ? Simple but still interesting

Why Othello ? Simple but still interesting

Why Othello ? Simple but still interesting

Why Othello ? Well known Simpler but still interesting

Why Othello ? Well known Simpler but still interesting Easy to implement

OLIVAW algorithm

OLIVAW algorithm

OLIVAW algorithm

OLIVAW algorithm

OLIVAW training process

OLIVAW training process +0,24 [[0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.03 0. 0. 0. 0. 0. ] [0.15 0. 0. 0. 0. 0. 0. 0. ] [ 0.64 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.18 0. 0. 0. 0. 0. ]]

OLIVAW training process +0,24 [[0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.03 0. 0. 0. 0. 0. ] [0.15 0. 0. 0. 0. 0. 0. 0. ] [ 0.64 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.18 0. 0. 0. 0. 0. ]]

OLIVAW training process +0,24 [[0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.03 0. 0. 0. 0. 0. ] [0.15 0. 0. 0. 0. 0. 0. 0. ] [ 0.64 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.18 0. 0. 0. 0. 0. ]]

OLIVAW training process -0,03 [[0. 0. 0. 11 0.11 0. 0. 0. 0.12 ] [0. 0.13 0.9 0. 0. 0. 0. 0. ] [0.10 0.11 0. 0. 0. 0. 0. 0. ] [0.12 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0. 0. 0. 0. 0. 0. ] [0. 0. 0.11 0. 0. 0. 0.11 0. ]]

OLIVAW training process VS

OLIVAW training process VS

OLIVAW training process 1. Self-play games generation

OLIVAW training process 1. Self-play games generation

OLIVAW training process 1. Self-play games generation

OLIVAW training process 1. Self-play games generation 2. Neural Net training

OLIVAW training process 1. Self-play games generation 2. Neural Net training VS

OLIVAW training process 1. Self-play games generation 2. Neural Net training VS

OLIVAW training process 1. Self-play games generation 2. Neural Net training 3. Evaluation

OLIVAW training process 1. Self-play games generation 2. Neural Net training 3. Evaluation

Pseudocode

Reaching superhuman strength

OLIVAW vs Alessandro Di Mattei 2016-2017 Italian champion

OLIVAW vs Alessandro Di Mattei 2016-2017 Italian champion 2-3 Draw – Draw – Defeat – Victory - Defeat 27-11-2018

OLIVAW vs Alessandro Di Mattei

OLIVAW vs Alessandro Di Mattei 4-0 3-12-2018

OLIVAW strength Alessandro Di Mattei

OLIVAW vs Di Mattei 03-12 Game 4

OLIVAW vs Michele Borassi 2008 World Othello champion

OLIVAW Is still training VIDEO TRAILER

OLIVAW vs Michele Borassi Best of 3 19 January 2019 16,30 – Dipartimento di Matematica Guido Castelnuovo Sapienza, piazzale Aldo Moro, 5

1-0 OLIVAW wins with black OLIVAW vs Michele Borassi Game 1

1-1 Michele Borassi wins with black OLIVAW vs Michele Borassi Game 2

1-2 Michele Borassi wins with black OLIVAW vs Michele Borassi Game 3

Thanks! Any questions? You can find me at [email protected] OLIVAW: reaching superhuman strength at Othello using the AlphaGo Zero paradigm with Zero budget Antonio Norelli
Tags