SHUBHAM AI PPT for grapsp about artificial intelligence.pdf
shubham21ece529
28 views
23 slides
Sep 05, 2024
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
about artificial intelligence
Size: 1.31 MB
Language: en
Added: Sep 05, 2024
Slides: 23 pages
Slide Content
GENERATIVE ARTIFICIAL
INTELLIGENCE
NAME:-SHUBHAM SINGH
U.ROLL:-21EUCEC056
C.ROLL:-21/529
BRANCH:-ELECTRONICS AND COMMUNICATION ENGINEERING
SEMESTER:-V1th
OVERVIEW
1.INTRODUCTION
2.BASIC CONCEPT
3.GENERATIVE AI MODEL CATEGORIES
4.NEURAL NETWORK ARCHITECTURES FOR GENERATIVE AI
INTRODUCTION
Generative AI refers to a category of artificial intelligence techniques and models that
are designed to generate new content, such as images, text, audio, or even video, that is
similar to the data it was trained on.
These models learn the underlying patterns and structures of the data they are exposed
to and then use that understanding to create new examples that resemble the original
data.
GENERATIVE AI AND LARGE LANGUAGE
MODELS
The rapid pace of AI development and public release tools such as
ChatGPT, GitHub Copilot, and DALL-E have attracted widespread
attention, optimism.
These technologies are all examples of “generative AI,” a class of
machine learning technologies that can generate new content—such
as text, images, music, or video—by analyzing patterns in existing
data.
TEXT-TO-IMAGE MODELS
1.DALL·E 2 :
DALL·E 2, created by OpenAI, is able to generate original, genuine and realistic images and art from a
prompt consisting on a text description. it is possible to use the OPENAI API to get access to this
model.
It uses the CLIP neural network. CLIP (Contrastive Language-Image Pre-Training) is a neural network
trained on a variety of (image, text) pairs.
For example: Image generated from the prompt ”A shibainuwearing a beret and black turtleneck”.
2 IMAGEN :
Imagen is a text-to-image diffusion model consisting on large transformer language models.
The model was created by Google and the API can be found in their web page.
The main discovery observed with this model made is that large language models, pre-trained on text-
only corpora, are very effective at encoding text for image synthesis.
Muse :
This model is a Text-to-image transformer model that achieves state-of the-art image
generation while being more efficient than diffusion or autoregressive models.
it is trained on a masked modelling task in discrete token space
it is more efficient because of the use of discrete tokens and requiring fewer sampling
iterations.
TEXT-TO-3D MODELS
some industries like gaming, it is necessary to generate 3D images.
Dreamfusion:
DreamFusionis a text-to-3D model developed by Google Research that uses a pretrained 2D
text-to-image diffusion model to perform textto-3D synthesis.
Dreamfusionreplaces previous CLIP techniques with a loss derived from distillation of a 2D
diffusion model
sampling in parameter space is much harder than in pixels as we want to create 3D models
that look like good images when rendered from random angles. To solve the issue, this model
uses a differentiable generator.
an image created by Dreamfusionfrom one particular angle along with all the variations that
can be generated from additional text prompts
Magic3D :
This model is a text to 3D model made by NVIDIA Corporation. While the Dreamfusion
model achieves remarkable results, the method has two problems: mainly, the long
processing time and the low-quality of the generated images.
These problems are addressed by Magic3D using a two-stage optimization framework .
Firstly, Magic3D builds a low-resolution diffusion prior and, then, it accelerates with a
sparse 3D hash grid structure.
Atextured 3D mesh model is furthered optimized with an efficient differentiable render.
Comparatively, regarding human evaluation, the model achieves better results, as 61.7%
prefer this model to DreamFusion.
IMAGE-TO-TEXT MODELS
Flamingo :
A Visual Language Model created by Deepmindusing few shot learning on a wide range of open-ended
vision and language tasks, simply by being prompted with a few input/output examples .
The input of Flamingo contains visually conditioned autoregressive text generation models able to ingest
a sequence of text tokens interleaved with images and/or videos and produce text as output.
A query is made to the model along with a photo or a video and the model answers with a text answer.
VisualGPT:
VisualGPTis an image captioning model made by OpenAI . VisualGPTleverages
knowledge from the pretrained language model GPT-2.
the biggest advantage of this model is that it does not need for as much data as other
image-to-text models
TEXT-TO-VIDEO MODELS
Phenaki:
This model has been made by Google Research, and it is capable of performing realistic video synthesis,
given a sequence of textual prompts.
Phenakiis the first model that can generate videos from open domain time variable prompts.
To address data issues, it performs joint training on a large image-text pairs dataset as well as a smaller
number of video-text examples can result in generalization beyond what is available in the video datasets.
limitations come from computational capabilities for videos of variable length. The model has three parts:
the C-ViViTencoder, the training transformer and the video generator.
Soundify:
Soundifyis a system developed by Runway that matches sound effects to video. This
system uses quality sound effects libraries and CLIP (a neural network with zero-shot image
classification capabilities cited before).
the system has three parts: classification, synchronization, and mix. The classification
matches effects to a video by classifying sound emitters within.
TEXT-TO-CODE MODELS
Codex :
AI system created by OpenAI which translates text to code. It is a general-purpose programming model,
as it can be applied to basically any programming task .
Programming can be broken down into two parts: breaking a problem down into simpler problems and
mapping those problems into existing code (libraries, APIs, or functions) that already exist.
NEURAL NETWORK ARCHITECTURES FOR
GENERATIVE AI
Autoencoder: Imagine you have a magic trick where you give someone a picture, they
squish it down into a small piece of paper, and then someone else can stretch that paper
back into the original picture. That's kind of how autoencoders work. They compress
data into a smaller representation (encoding) and then try to reconstruct the original
data from that compressed form. Autoencoders are used for tasks like image denoising
or dimensionality reduction.
Generative Adversarial Network (GAN): Picture two artists, one trying to forge paintings
and the other trying to spot the fakes. The forger keeps getting better until the spotter
can't tell the difference between the real and fake paintings. That's the idea behind
GANs. They consist of two neural networks: a generator that creates new data samples,
like images, and a discriminator that tries to differentiate between real and fake
samples. Through this back-and-forth process, both networks get better at their
respective tasks, ultimately resulting in the generator creating very realistic-looking
outputs.