Master Thesis Slides: Topic Development of Methods for Deep Neural Network Architectures Optimization based on Tensor Factorization Algorithms

ssuserfa68c1 8 views 18 slides Aug 27, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Current deep networks, having an excessive number of hyperparameters, have several optimums. They are trained until the value of the loss function is nearly zero (i.e., close to optimality), and training is considered successful if the optimal (or near-optimal) model thus found also works well on de...


Slide Content

Development of Methods for Deep Neural
Network Architectures Optimization based on
Tensor Factorization Algorithms
Supervisor:Revin Ilya Evgenyevich, research associate,
Laboratory for Composite AI, Research Center “Strong AI in Industry”
Presented by: Zakharov Denis, J4232c

Pain: Current deep networks have excessive parameters which results in
long training time and expensive storing wights production.
Hypothesis: With the use of optimization method we can reduce size of
stored model, increase throughput.
Work: Development of Methods for Deep Neural Network Architectures
Optimization based on Tensor Factorization Algorithms
Problem
2

Purpose and objectives of study
Goal
Development of Methods for Deep Neural Network Architectures Optimization based on Tensor
Factorization Algorithms
Objectives
•Make a Literature review of related field to provide a background
•Make a scientific research of tensor Algorithms (Optimization, Operations)
•Make a scientific research of LoRA approach
•Perform experiment with TS Models
•Based on these findings propose a product that can be integrated into AutoML solution
•Develop Optimization Method
•Contribute to Fedot.Industrial framework
3

BERT 340MT5 11BGPT-3 175B
Megatron-Turing 530B
GPT-4 1.76T
Gemini Pro ≈30T
Gemini Ultra ≈60T
2018201920202021202220232024
ModelTrend
Modern Networks
4
Number
of parameters

Tensor Decomposition
5
!≈=
!!
"!
#!
"
!
"#$
$%
&
!≈
!!
!"!#
!!
!"
!#
!
!!≈
""
#!
$!!
!
"#$
!"# $!=∗ ∗
t-SVD (Singular Value Decomposition)CANDECOMP/PARAFAC
Tucker Block Term

Low-rank decomposition
6
! "!"#!≈×
!×#!×$
#×$

Statement of Experiment
7
Check performance of TS models NBEATS, Transformer, ARIMA on M4
dataset for checking how would they predict timeseries based on data.

Models
8
Transformer—state-of-the-art deep learning model introduced in
2017. It is an encoder-decoder architecture whose core feature is
the ‘multi-head attention’ mechanism
LinearLinearLinear
Scaled Dot-Product Attention
Concat
Linear
Multi-Head Attention
MatMul
SoftMax
Mask
Scale
MatMul
Scaled Dot-Product Attention
InOut
Multi-Head
Attention
Add & Norm
Feed
Forward
Add & Norm
Masked
Multi-Head
Attention
Add & Norm
Multi-Head
Attention
Add & Norm
Feed
Forward
Add & Norm
SoftMax
Linear
!
×
!
×

Models
9
NBEATS—deep learning-based approach for time series forecasting.
ARIMA—popular statistical model used to forecast future values in a time
series based on past values.
Block
FC Stack
FCFC
!!(#!)!"(#")
Backcast
Stack
Forecast
Stack
Block 1
Block 2
Block K

Forecast
Global
Forecast
Stack 1
Stack 2
Stack M

M4 Dataset
10
TheM4dataset is a collection of 100,000 time series used for the fourth
edition of the Makridakisforecasting Competition.
Consists of time series of:
•Yearly —63 avg training length
•Quarterly —125
•Monthly —302
•Weekly—2035
•Daily —475
•Hourly —682

Experiment - Monthly
11
ARIMANBEATS Transformer

Interpretation of results
12
On short series all three models performed well and suitable for predicting
However with longer range
•NBEATS perform better than others models
•Some models has a critical difference in predictions
•Longer Range all models not struggle
Training on just 50 epochs took:
•Almostan hour for NBEATS
•20 min for Transformer
•1 min for Arima

LoRA + rSVD
!∈#!×!
A = "(0,&!)
B = 0
r
x
h
!!
"×$
%
$×&
'
"×&
=
(
"×"
)
"×"
*!
"×&
SVD
recover
(
$×"
=
(+
"×"
!
$×"
13

Results NBEATS
Base Model
LoRA
Layers
Modellatencythroughput
No LoRA0.001084364808.0
LoRA Layer0.001664159606.0
MS Default0.001064748564.0
MS All0.001084482670.0
MS LoRA0.001055365210.0
14

Results Transformer
Modellatencythroughput
No LoRA0.000859306328.0
LoRA Layer0.000849782205.0
MS Default0.000859572622.0
MS All0.000859463282.0
MS LoRA0.000848586088.0
0
0,01
0,02
0,03
0,04
0,05
0,06
0,07
Epoch
1
Epoch
2
Epoch
3
Epoch
4
Epoch
5
Epoch
6
Epoch
7
Epoch
8
Model Train With Early Stopping
BaseLoRA
15

LoRA Implementation
14

Resume
In this work:
•Provide an Experiment for
research of performance:
•NBEATS
•ARIMA
•Transformer
•Logically assumed that layers in
models could be replaced using
LoRa approach
•Implement logic as a master
thesis
15
TN diagrams of some popular decompositions
!(")
!($)!(%)
"(")"($)"(%)"(&)
!!!"!#!$
!(",")
!(%,")
!(&,")
!(",%)
!(%,%)
!(&,%)
!(",&)
!(%,&)
!(&,&)
"""%"&
"'"(")
"*"+",
!(")
!"!#!$
…!($)!(%)%"%%&"%$
!(")
!"!#!$
…!($)!(%)%"%%&"%$
%'
!(")
"$
""
"%…
!(%)
!($)
!!
!"!#
#
!(")
"$
""
"%…
!(%)
!($)
#!!
!"!#
!(")
"$
""
"%…
!(%)
!($)
#!!
!"!#
#(")

THANK YOU
FOR YOUR TIME!
@misterzurg