Microsoft Office 365 Crack Latest Version 2025? Free

raheemk1122g 28 views 33 slides Apr 20, 2025

Slide 1 of 33

About This Presentation

COPY & PASTE LINK>>> https://click4pc.com/after-verification-click-go-to-download-page/

Microsoft 365 (Office) is a powerful application designed to centralize all of your commonly used Office and Microsoft 365 applications in one

Size: 3.66 MB

Language: en

Added: Apr 20, 2025

Slides: 33 pages

Slide Content

An Image is Worth 16x16 Words:
Transformers for Image Recognition at Scale
Anonymous (ICLR 2021 under
review)
Choi Dongmin
Yonsei University Severance Hospital CCIDS

Abstract
•Transformer
-standard architecture for NLP
•Convolutional Networks
-attention is applied keeping their overall structure
•Transformer in Computer Vision
-a pure transformer can perform very well on image classification
tasks when applied directly to sequences of image patches
-achieved S.O.T.A with small computational costs when pre-trained
on large dataset

Introduction
Transformer
BERT
The dominant approach : pre-training on a large text corpus
and then fine-tuning on a smaller task-specific dataset
Self-attention
based architecture
V
a
s
w
a
n
i

e
t

a
l
.

A
t
t
e
n
t
i
o
n

I
s

A
l
l

Y
o
u

N
e
e
d
.

N
I
P
S

2
0
1
7

Introduction
Self-Attention in CV inspired by
NLP
DETR
Axial-DeepLab
However, classic ResNet-like architectures are still S.O.T.A
Carion et al. End-to-End Object Detection with Transformers. ECCV 2020
Wang et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. ECCV 2020

•Applying a Transformer Directly to Images
-with the fewest possible modifications
-provide the sequence of linear embeddings of the patches as an input
-image patches = tokens (words) in NLP
•Small Scale Training
-achieved accuracies below ResNets of comparable size
-Transformers lack some inductive biased inherent to
CNNs (such as translation equivariance and locality)
•Large Scale Training
-trumps (surpass) inductive bias
-excellent results when pre-trained at sufficient scale and
transferred
Introduction

Related Works
Vaswani et al. Attention Is All You Need. NIPS 2017
Devlin et al. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL 2019
Radford et al. Improving language under- standing with unsupervised learning. Technical Report 2018
Transformer
-Standard model in NLP tasks
-Only consists of attention modules
not using RNN
-Encoder-decoder
-Requires large scale dataset and
high computational cost
-Pre-training and fine-tuning
approaches : BERT & GPT

Method

Method
ViT Base D=768= 16x16x3
ViT Large D=1028
ViT huge D=1280

Method

Method
Trước khi đi vào công thức, hãy hiểu lý do tại sao cần có bước này:
1.Ổn định học: Giúp giảm sự biến động trong gradient, cho phép mô hình học với
tốc độ học (learning rate) cao hơn và hội tụ nhanh hơn.
2.Internal Covariate Shift: Giảm hiện tượng phân phối của dữ liệu thay đổi qua các
lớp khi mô hình cập nhật trọng số.
3.Vị trí đặc biệt: Sử dụng "Pre-norm" (chuẩn hóa trước) giúp ổn định quá trình
huấn luyện cho mô hình Transformer sâu.

Method

Method
Layer Normalization (Norm) thứ hai
Chuẩn hóa lại đầu ra sau Multi-Head Attention:
•Chuẩn hóa riêng từng vector trong chuỗi trung gian
•Giúp ổn định đầu vào cho MLP
Layer Normalization (Norm) thứ hai
Chuẩn hóa lại đầu ra sau Multi-Head Attention:
•Chuẩn hóa riêng từng vector trong chuỗi trung gian
•Giúp ổn định đầu vào cho MLP

Method
Image x ∈ R
H×W×C
→ A sequence of flattened 2D patches x
p∈ R
N×(P
2
·C)
x∈
R
2
p p
→ x E ∈
R
N×(P ·C) N×D
* Because Transformer uses constant
widths, model dimension, through all of its layers
Learnable Position Embedding
Epos ∈ R
(N+1)×D
* to retain positional information
Trainable linear projection
maps
z
0
L
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py#L99-L111

Method

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py
z ∈ R
N×D
: input sequence
Attention weight A
ij : similarity btw q
i
, k
j

Method
https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py

Method
Hybrid
Architecture
Carion et al. End-to-End Object Detection with Transformers. ECCV 2020
Flattened intermediate feature
maps of a ResNet
as the input sequence like
DETR

Method
Fine-tuning and Higher
Resolution
Carion et al. End-to-End Object Detection with Transformers. ECCV 2020
Remove the pre-trained prediction head and attach a zero-initialized
D × K feedforward layer ( =the number of
downstream classes)

Experiments
•Datasets
< Pre-training >
-ILSVRC-2012 ImageNet dataset : 1k classes / 1.3M images
-ImageNet-21k : 21k classes / 14M images
-JFT : 18k classes / 303M images
< Downstream (Fine-tuning) >
-ImageNet, ImageNet ReaL, CIFAR-10/100, Oxford-IIIT Pets,
Oxford Flowers-102, VTAB
•Model Variantsex : ViT-L/16 = “Large” variants, with 16 X 16 input patch size

Experiments
•Training & Fine-tuning
< Pre-training>
- Adam with β
1 = 0.9, β
2 = 0.999
-Batch size 4,096
-Weight decay 0.1 (high weight decay is useful for transfer models)
-Linear learning rate warmup and decay
< Fine-tuning >
-SGD with momentum, batch size 512
•Metrics
-Few-shot (for fast on-the-fly evaluation)
-Fine-tuning accuracy

Experiments
•Comparison to State of the Art
*BiT-L : Big Transfer, which performs supervised transfer learning with large ResNets
*Noisy Student : a large EfficientNet trained using semi-supervised learning
Kolesnikov et al. Big Transfer (BiT): General Visual Representation Learning. ECCV
2020 Xie et al. Self-training with noisy student improves imagenet classification. CVPR
2020

Experiments
•Comparison to State of the Art

Experiments
•Pre-training Data Requirements
Larger Dataset
Larger Dataset

Experiments
•Scaling
Study

Experiments
•Inspecting Vision Transformer
The components resemble plausible basis functions
for a low-dimensional representation of the fine structure within each patch
analogous to receptive field size in CNNs

Conclusion
•Application of Transformers to Image Recognition
-no image-specific inductive biases in the architecture
-interpret an image as sequence of patches and process it by a standard
Transformer encoder
-simple, yet scalable, strategy works
-matches or exceeds the S.O.T.A being cheap to pre-train
•Many Challenges Remain
-other computer vision tasks, such as detection and segmentation
-further scaling ViT

Microsoft Office 365 Crack Latest Version 2025? Free

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Microsoft Office 365 Crack Latest Version 2025? Free

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......