Perceiver Presentation Template for Notes

adalgos 2 views 1 slides Mar 01, 2025
Slide 1
Slide 1 of 1
Slide 1
1

About This Presentation

Perceiver


Slide Content

PERCEIVER.IO * Transformer-based neural network architecture. * Generate meaningful outputs * High-dimensional, multimodal input data. * Cross-attention mechanisms #1. Input Encoding -> Encoder: High-dimensional sensory inputs encoded into a lower-dimensional latent space. Linear transformations and self-attention layers. #2. Latent Array Formation : Encoded Inputs -> Latent Array Each element of the array -> learned feature -> Encoded inputs. #3 Cross-Attention: Latent array -> Cross-attention operation. Each element attends to the encoded inputs . Latent array -> encoded inputs -> Learned attention weights. #4 Latent Processing: Latent array -> self-attention layers and feed-forward neural networks. Array elements communicate with each other + refine representations -> input data context. # 5 Iterative Processing: Steps 3 and 4 repeated multiple times. Each iteration -> Latent array -> more relationships and abstractions from the input data. # iterations -> hyperparameter tuned based on the task. ( Shridar , Manuelli and Fox, 2023; Jaegle , et al., 2022) #6. Output Query Generation: Output queries -> Learned vectors (motor commands, predicted sensory data, or classification labels). #7. Cross-Attention to Latent Array: Output queries -> latent array -> Cross-attention operation. #8. Output Decoding -> Decoder: Attended latent representations for each Output query -> Decoded Output . Linear transformations and feed-forward neural networks. #9. Post-processing: Decoded Outputs -> Post-processed Outputs . E.g. The decoded motor commands (filtered or smoothed) to ensure smooth and stable robot motions. #10. Output Execution: Post-processed Outputs -> Actuators . Sensory data is then fed back into the Perceiver, and the process repeats from step 1. Raw Data Input -> Input encoded -> Latent Array -> Cross Attention -> Self-Attention & FF Output Queries -> Latent Array -> Decoded Outputs -> PP Outputs -> Actuators
Tags