Microsoft Office 365 Crack Latest Version 2025? Free

raheemk1122g 28 views 33 slides Apr 20, 2025
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

COPY & PASTE LINK>>> https://click4pc.com/after-verification-click-go-to-download-page/

Microsoft 365 (Office) is a powerful application designed to centralize all of your commonly used Office and Microsoft 365 applications in one


Slide Content

An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale
Anonymous (ICLR 2021 under
review)
Choi Dongmin
Yonsei University Severance Hospital CCIDS

Abstract
•Transformer
-standard architecture for NLP
•Convolutional Networks
-attention is applied keeping their overall structure
•Transformer in Computer Vision
-a pure transformer can perform very well on image classification
tasks when applied directly to sequences of image patches
-achieved S.O.T.A with small computational costs when pre-trained
on large dataset

Introduction
Transformer
BERT
The dominant approach : pre-training on a large text corpus
and then fine-tuning on a smaller task-specific dataset
Self-attention
based architecture
V
a
s
w
a
n
i

e
t

a
l
.

A
t
t
e
n
t
i
o
n

I
s

A
l
l

Y
o
u

N
e
e
d
.

N
I
P
S

2
0
1
7

Introduction
Self-Attention in CV inspired by
NLP
DETR
Axial-DeepLab
However, classic ResNet-like architectures are still S.O.T.A
Carion et al. End-to-End Object Detection with Transformers. ECCV 2020
Wang et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. ECCV 2020

•Applying a Transformer Directly to Images
-with the fewest possible modifications
-provide the sequence of linear embeddings of the patches as an input
-image patches = tokens (words) in NLP
•Small Scale Training
-achieved accuracies below ResNets of comparable size
-Transformers lack some inductive biased inherent to
CNNs (such as translation equivariance and locality)
•Large Scale Training
-trumps (surpass) inductive bias
-excellent results when pre-trained at sufficient scale and
transferred
Introduction

Related Works
Vaswani et al. Attention Is All You Need. NIPS 2017
Devlin et al. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL 2019
Radford et al. Improving language under- standing with unsupervised learning. Technical Report 2018
Transformer
-Standard model in NLP tasks
-Only consists of attention modules
not using RNN
-Encoder-decoder
-Requires large scale dataset and
high computational cost
-Pre-training and fine-tuning
approaches : BERT & GPT

Method

Method
ViT Base D=768= 16x16x3
ViT Large D=1028
ViT huge D=1280

Method

Method

Method

Method
Trước khi đi vào công thức, hãy hiểu lý do tại sao cần có bước này:
1.Ổn định học: Giúp giảm sự biến động trong gradient, cho phép mô hình học với
tốc độ học (learning rate) cao hơn và hội tụ nhanh hơn.
2.Internal Covariate Shift: Giảm hiện tượng phân phối của dữ liệu thay đổi qua các
lớp khi mô hình cập nhật trọng số.
3.Vị trí đặc biệt: Sử dụng "Pre-norm" (chuẩn hóa trước) giúp ổn định quá trình
huấn luyện cho mô hình Transformer sâu.

Method

Method
Layer Normalization (Norm) thứ hai
Chuẩn hóa lại đầu ra sau Multi-Head Attention:
•Chuẩn hóa riêng từng vector trong chuỗi trung gian
•Giúp ổn định đầu vào cho MLP
Layer Normalization (Norm) thứ hai
Chuẩn hóa lại đầu ra sau Multi-Head Attention:
•Chuẩn hóa riêng từng vector trong chuỗi trung gian
•Giúp ổn định đầu vào cho MLP

Method
Image x ∈ R
H×W×C
→ A sequence of flattened 2D patches x
p∈ R
N×(P
2
·C)
x∈
R
2
p p
→ x E ∈
R
N×(P ·C) N×D
* Because Transformer uses constant
widths, model dimension, through all of its layers
Learnable Position Embedding
Epos ∈ R
(N+1)×D
* to retain positional information
Trainable linear projection
maps
z
0
L
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py#L99-L111

Method

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py
z ∈ R
N×D
: input sequence
Attention weight A
ij : similarity btw q
i
, k
j

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py

Method
Hybrid
Architecture
Carion et al. End-to-End Object Detection with Transformers. ECCV 2020
Flattened intermediate feature
maps of a ResNet
as the input sequence like
DETR

Method
Fine-tuning and Higher
Resolution
Carion et al. End-to-End Object Detection with Transformers. ECCV 2020
Remove the pre-trained prediction head and attach a zero-initialized
D × K feedforward layer ( =the number of
downstream classes)

Experiments
•Datasets
< Pre-training >
-ILSVRC-2012 ImageNet dataset : 1k classes / 1.3M images
-ImageNet-21k : 21k classes / 14M images
-JFT : 18k classes / 303M images
< Downstream (Fine-tuning) >
-ImageNet, ImageNet ReaL, CIFAR-10/100, Oxford-IIIT Pets,
Oxford Flowers-102, VTAB
•Model Variantsex : ViT-L/16 = “Large” variants, with 16 X 16 input patch size

Experiments
•Training & Fine-tuning
< Pre-training>
- Adam with β
1 = 0.9, β
2 = 0.999
-Batch size 4,096
-Weight decay 0.1 (high weight decay is useful for transfer models)
-Linear learning rate warmup and decay
< Fine-tuning >
-SGD with momentum, batch size 512
•Metrics
-Few-shot (for fast on-the-fly evaluation)
-Fine-tuning accuracy

Experiments
•Comparison to State of the Art
*BiT-L : Big Transfer, which performs supervised transfer learning with large ResNets
*Noisy Student : a large EfficientNet trained using semi-supervised learning
Kolesnikov et al. Big Transfer (BiT): General Visual Representation Learning. ECCV
2020 Xie et al. Self-training with noisy student improves imagenet classification. CVPR
2020

Experiments
•Comparison to State of the Art

Experiments
•Pre-training Data Requirements
Larger Dataset
Larger Dataset

Experiments
•Scaling
Study

Experiments
•Inspecting Vision Transformer
The components resemble plausible basis functions
for a low-dimensional representation of the fine structure within each patch
analogous to receptive field size in CNNs

Conclusion
•Application of Transformers to Image Recognition
-no image-specific inductive biases in the architecture
-interpret an image as sequence of patches and process it by a standard
Transformer encoder
-simple, yet scalable, strategy works
-matches or exceeds the S.O.T.A being cheap to pre-train
•Many Challenges Remain
-other computer vision tasks, such as detection and segmentation
-further scaling ViT

Q&
A•ViT for Segmentation
•Fine-tuning on Grayscale Dataset

Thank you