Generative Artificial Intelligence and Large Language Model

shiwanigupta 322 views 29 slides Oct 18, 2024
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Natural Language Processing (NLP) is a discipline dedicated to enabling computers to comprehend and generate human language.
Word embedding is a technique in NLP that converts words into dense numerical vectors, capturing their semantic meanings and contextual relationships. Analyzing sequential da...


Slide Content

Generative AI and LLM
Dr.ShiwaniGupta
Associate Professor, HoDAI&ML
TCET, Mumbai

NLP
NaturalLanguageProcessing(NLP)isadisciplinededicatedtoenablingcomputersto
comprehendandgeneratehumanlanguage.
Itencompassestaskssuchaslanguagetranslation,sentimentanalysis,andtext
summarization.
Byemployingalgorithmsandmodelstoprocessandanalyzetext,NLPallowscomputers
toderivemeaningandexecutelanguage-basedfunctions.
Thistechnologyhasawiderangeofapplicationsacrossdifferentindustries,significantly
enhancingcommunicationandinformationretrieval.
Unstructured(Text)datainformofemail,blog,news…
SocialMediaplatforms:twitter,FB,Quora
Sentiments(product,app,movie,service)
Socialmediaplatformsandchatbotapplicationstoreachouttocustomers

Difficult to learn
Right as ‘ryt’, How are you as ‘hru’
NLP really a hard problem to solve

NLPApplications
•Fromcustomerservicechatbotstolanguagetranslationapps
•healthcare,finance,andeducation
•Byallowingmachinestoextractmeaning,analyzesentiments,andsummarizetext,NLPhas
revolutionizedcommunication,makingitanessentialtechnologyinourincreasingly
interconnectedworld.
Opinion, feeling, emotions
Feedback, Comment, rating, like
Amazon, Flipkart
Product/Service
Intent Analysis
Digital medium over IVR or customer
call center
Complaint, opinion, comment,
statement, feedback, query, suggestion
Automated ticketing system
Extract info from resume,
financial attributes, events from
news for trading

NLPApplications
Automated Text generation
Q&A system
Text to speech and vice versa
Topic Modeling
Word to word to sentencesEmployee engagement
Obtain transactional info
From bounded Qs to
responses to free text in
multiple languages
Auto response
Difficult to get prevcontext
Incorrect sentence leads to
negative publicity or legal
complications
News
Extract important
Rewrite whole article capturing
context

Tokenization
•Tokenization in NLP involves breaking text into smaller units, such as words or characters, for analysis.
•It serves as the foundation for tasks like part-of-speech tagging and sentiment analysis.
•This process entails removing punctuation, splitting words, and addressing special cases to create tokens.
•Preprocessing: STOP WORD REMOVAL, STEMMING, LEMMATIZATION
•NLTK, Spacy packages
Preposition,
joining word,
conjunction
Inflectional form
to base form

Numericalization
•Numericalizationin NLP involves transforming text data into numerical formats that machine
learning algorithms can interpret and process. This conversion enables NLP models to handle
and analyze text through mathematical operations.
Bag of word Model –One Hot EncodingWeight is 1 irrespective of freqof word

Word Embedding
WordembeddingisatechniqueinNLPthatconvertswordsinto
densenumericalvectors,capturingtheirsemanticmeaningsand
contextualrelationships.Unliketraditionalmethodsthatusesparse
representations,wordembeddingsprovideamorecompactand
informativerepresentationofwords.ThisapproachenablesNLP
modelstounderstandandinterpretlanguagemoreeffectively,asit
incorporatesnuancesofwordmeaningsandtheirusageindifferent
contexts.Byleveragingwordembeddings,modelscanperform
complextaskssuchasmeasuringwordsimilarity,identifying
relationshipsbetweenwords,andenhancingcontext-aware
operations,leadingtoimprovedlanguageunderstandingand
application.

LearningwordembeddingsinvolvesusingalgorithmslikeWord2VecandGloVetotrain
modelsthatgeneratedensevectorrepresentationsofwords,capturingtheirsemanticand
contextualrelationships.Theseembeddingsarecreatedbyanalyzinglargecorporaoftext
data,whichallowsthemodeltounderstandwordmeaningsandtheirusageinvarious
contexts.Theresultingembeddingsofferarich,nuancedrepresentationofwords,
significantlyimprovingperformanceondiverseNLPtaskssuchaswordsimilarity,context
understanding,andlanguagegeneration.Byleveragingtheseembeddings,NLPmodelscan
achievemoreaccurateandmeaningfulinterpretationsoflanguage.

Word2Vec and Negative Sampling
Word2VecisanNLPalgorithmthatlearnswordembeddingsbytraininga
neuralnetworkonextensivetextdatasets.Itemployseithertheskip-gram
orContinuousBagofWords(CBOW)methodstopredictwordsbasedon
theirsurroundingcontext.Throughthisprocess,Word2Vecgenerates
densevectorrepresentationsofwordsthatencapsulatetheirsemantic
relationshipsandcontextualmeanings.Theseembeddingsaretypically
representedinjustafewdozendimensions,enablingefficientand
effectivehandlingoflanguagetaskssuchasmeasuringwordsimilarity
andperformingvariouslanguageprocessingapplications.Thiscompact
representationfacilitatesimprovedlanguageunderstandingand
application.

•In the figure we see that the word embeddingsare represented by the weights connecting between the hidden and output
layer.
•If we have 500 neurons in the hidden layer and 1000 neurons that is if the vocabulary is 1000 in the output layer we have
to learn around 0.5 million weights, which might not be too huge but generally in any practical scenario we deal with
bigger vocabularies.
•If we even consider 10000 words in our vocabulary, then we have to learn a whopping 5 million weights.
•Besides that we know that for embeddingsto capture several context we would need a pretty huge corpus.
•So, training these many weights for a huge corpus and applying softmaxon 10000 weights is computationally very
expensive and sometimes infeasible.
•This issue could be addressed using negative sampling technique.

Properties and Visualization of Word Embeddings
WordembeddingsinNLPexhibitseveralimportantproperties:capturingsemanticrelationships,enabling
compositionality,managingsubwordinformation,maintainingcompactness,andadaptingtocontext.
Thesepropertiesallowembeddingstoeffectivelyrepresentwordmeanings,constructphraseandsentence
representations,handleout-of-vocabularywords,andreducedimensionality.
Byincorporatingtheseaspects,wordembeddingsenhancevariouslanguageprocessingtasks,leadingto
improvedunderstandingandperformanceinNLPapplications.
Wordembeddingscanbevisualizedinareduced-dimensionalspacetoprovideinsightsintoword
relationships.
Thistechniqueenablestheobservationofclustersofsemanticallysimilarwordsandtheexplorationoftheir
connectionsinavisuallyinterpretableformat.
Byprojectinghigh-dimensionalembeddingsintoalower-dimensionalspace,patternsandrelationships
betweenwordsbecomemoreapparent,facilitatingaclearerunderstandingoftheirsemanticsimilaritiesand
differences.
Suchvisualizationshelpinanalyzingandinterpretingcomplexwordassociationsandtheoverallstructureof
thewordembeddings.

Word Embedding
GloVe–Global Vectors for word representation
The model is trained on multiple data sets including
Wikipedia, Twitter and Common Crawl on billions of
tokens and the embeddingsare represented in different
dimension size ranging from 50 to 300.
“glove.6B.zip” file available in the following website
and just consider the 50-dimension representation
usethedimensionreductiontechniqueliket-SNE
thatis,t-DistributedStochasticNeighborembedding
toreducethedimensionsto2andplotaround500
wordsonthose2-dimensions

Embedding Matrix
InNLP,theembeddingmatrixisacrucialcomponentthatmapswordstotheirrespectivevectorrepresentations.
Thismatrixenablesmodelstoaccessandleveragethelearnedwordembeddingsduringbothtrainingandinferencephases.
Thesizeoftheembeddingmatrixisdeterminedbytwofactors:thevocabularysize,whichrepresentsthenumberofunique
words,andthedimensionalityoftheembeddings,whichindicatesthenumberoffeaturesineachvector.
Byorganizingwordvectorsinthismatrix,modelscanefficientlyusetheseembeddingstoperformvariouslanguage
processingtasksandimprovetheiroverallperformance.
Keraslayer as the first layer for NLP related applications: Text classification (sentiment analysis), Machine translation, NER,
Text summarization. It maps indices to vectors.
# of rows equal to the vocabulary size and the number of columns equal to the dimension of the embeddingswe define

Sequential/Temporal/Series Data: RNN & LSTM
Sequential data consists of information organized in a specific order, where the sequence is
meaningful. This type of data includes time series, text, audio, DNA, and music. Analyzing
sequential data often requires techniques such as time series analysis and sequence modeling,
using machine learning models like Recurrent Neural Networks (RNNs) and Long Short-Term
Memory networks (LSTMs).
Unstructured Sequential: speech, text, videos, music, etc…sequence of symbol, image, notes, letters, words, etc.
Eg. daily average temperature of a
city, monthly revenue of a company
Internet of Things kind of environment, where we
would have univariate and multivariate time series
data for multiple entities like sensors

SPEECH/VOICE RECOGNITION: I/P is
audio O/P is name or person identifier
SENTIMENT ANALYSIS: I/P is sequence of
char O/P is category
MUSIC CREATION: I/P is single value
O/P is sequence of nodes
IMAGE CAPTIONING: I/P is image O/P is
sequence of words
LANGUAGE TRANSLATION: I/P and O/P is
sequence of char/words of different size
VIDEO FILES: Sequence of images
Video activity recognition/object tracking….both I/P O/P sequence of frames

RNN and its variants (LSTM, GRU, Bi-RNN, S-RNN)
•MultilayerPerceptrons(MLPs)aredesignedtoprocessfixed-sizeinputs,treatingeachinputasanindependentdata
pointwithoutconsideringanysequentialortime-basedrelationships.Duetothislimitation,MLPscannotcapture
patternsthatdependontheorderofthedata,makingthemunsuitablefortimeseriesanalysis.Incontrast,Recurrent
NeuralNetworks(RNNs)arespecificallydesignedtohandlesequentialinformationthroughtheirrecurrent
connections,makingthemamoresuitablechoicefortasksinvolvingtimeseriesdata.
I/P data is indepof each other but there is a time
relationship
In MLP I/P and O/P size is fixed
ARecurrentNeuralNetwork(RNN)isatypeofneural
networkdesignedforprocessingsequentialdata.It
featuresloopsthatallowinformationtoberetainedacross
timesteps,makingiteffectiveatcapturingtemporal
patterns.ThiscapabilitymakesRNNsparticularlyuseful
forapplicationssuchastimeseriesforecasting,speech
recognition,andnaturallanguageprocessing.More
advancedvariants,likeLongShort-TermMemory
(LSTM)andGatedRecurrentUnit(GRU)networks,have
beendevelopedtoovercomethelimitationsoftraditional
RNNs,suchasdifficultyinlearninglong-term
dependencies.

Types of RNN Based on Cardinality
1.One-to-One (1:1):This is a standard feedforward neural network used for non-sequential data.
2.Many-to-One (N:1):This type processes multiple inputs to produce a single output, such as in sentiment analysis.
3.One-to-Many (1-N):This setup uses a single input to generate multiple outputs, such as in image captioning.
4.Many-to-Many (N-N):This configuration handles multiple inputs and produces multiple outputs, which is common in
machine translation.
5.Many-to-Many (N-M):This flexible structure allows for varying sequence lengths in both inputs and outputs, useful in
applications like video analysis.
One to one
Character/word prediction
Sales forecasting
Many to one
Sentiment Analysis
Predict M/C failure
One to Many
Music generation
Image Captioning
Seqto Vector to seqN/W
Many to Many cardinality
Language Translation
Variable I/O seq

To train an RNN using Backpropagation Through Time (BPTT):
1.Unroll the RNN:Treat each time step as a separate layer.
2.Forward Pass:Generate predictions.
3.Calculate Loss:Compare predictions with actual values.
4.BackpropagateError:Propagate the error through time.
5.Update Parameters:Adjust using an optimization algorithm.
6.Repeat:Continue for multiple epochs.
To prevent vanishing gradients in long sequences, use techniques like
gradient clipping or advanced RNN variants like LSTM and GRU.
Training RNNs: BPTT

TruncatedBackpropagationThroughTime(TruncatedBPTT)isamodifiedversionofthestandardBPTT
algorithmfortrainingRNNswithlongsequences.Itinvolveslimitingthenumberoftimestepsoverwhicherror
gradientsarebackpropagated,insteadofpropagatingthemthroughtheentiresequence.
RunningperparameterupdateonBPTTiscomputationallyexpensive,runningformultipleepochsnotfeasible
Breakseqtosubseq,computationallyfeasiblebuttemporaldependencyreducedtosubseqlevel
Training RNNs: BPTT

Here’s a brief overview of different types of Recurrent Neural Networks (RNNs):
1.Long Short-Term Memory (LSTM): LSTMs are a type of RNN designed to remember information for long periods.
They use special units called memory cells that can maintain information in memory for long durations. LSTMs are effective
for tasks like time series prediction and natural language processing.
2.Gated Recurrent Unit (GRU): GRUs are similar to LSTMs but with a simpler structure. They use gating mechanisms to
control the flow of information, making them faster to train and sometimes more efficient for certain tasks. GRUs are often
used in similar applications as LSTMs, such as speech recognition and machine translation.
3.Character Prediction: This refers to RNNs used for predicting the next character in a sequence. These models are trained
on text data and can generate text one character at a time, making them useful for tasks like text generation and
autocompletion.
4.Stacked RNNs: Stacked RNNs consist of multiple layers of RNNs stacked on top of each other. This architecture allows
the model to learn more complex patterns by capturing different levels of abstraction. They are commonly used in tasks that
require deep understanding, such as language modeling and sequence-to-sequence tasks.
5.Bidirectional RNNs: These RNNs process sequences in both forward and backward directions. By having access to both
past and future contexts, bidirectional RNNs can better understand the entire sequence. They are particularly useful in tasks
like speech recognition and text classification, where context is important.
These various types of RNNs can be combined or adapted for specific use cases, depending on the requirements of the task
at hand.
Types of RNN

LSTM
•Sepp H and JurgenS (1997): solve complex probswith long time dependency and
ran faster and efficiently
•To address memory issue: Long term state (c), short term state (h)
•Forgets not so imp old memories
•Updates/refreshes old memories, forms new imp ones
Each GATE has a NN
Main NN produces O/P based on I/P and prevstate of cell and
updates long term memory
Forget GATE determines how much of the long term memory
needs to be forgotten or retained
Input GATE figures out important part of I/P and concatenates
that to long term state
Output GATE decides how much of updated long term memory
should be considered as part of O/P of cell

2014
NN-3
1 state
1 NN
to ctrl
I/P and
Forget
GATE

Stacked and Bi directional RNN
Time series forecasting
Language modeling
Named entity recognition
Machine Translation

Encoder Decoder Seqto SeqModel
•TheEncoder-DecoderarchitectureisanRNNframeworkdesignedforsequence-to-
sequencetasks.Inthissetup,theEncoderprocessesaninputsequenceandproducesa
contextvector,whichencapsulatestheinformationfromtheinput.TheDecoderthen
usesthiscontextvectortogenerateanoutputsequence.Thisarchitectureiscommonly
appliedinareassuchasmachinetranslation,textsummarization,andspeech
recognition.
Teacher Forcing: correct word acts as a teacher and forces the model to correct
immediately when prediction is wrong

Beam Search and Bleu Evaluation Matrices
BeamSearch:BeamSearchisasearchalgorithmusedinsequence-to-sequencemodels,particularlyinnaturallanguage
processingtasks.Unlikethegreedysearchthatselectsthebestoptionateachstep,BeamSearchkeepstrackofmultiple
hypotheses(beams)ateachstep,expandingthetopNsequenceswiththehighestprobabilities.Thismethodbalancesbetween
searchingbroadlyandefficiently,aimingtofindthemostlikelysequenceoftokens.Itiswidelyusedintaskslikemachine
translationandspeechrecognitiontoimprovethequalityofgeneratedsequences.
BLEU(BilingualEvaluationUnderstudy):BLEUisapopularevaluationmetricforassessingthequalityoftextgeneratedby
machinetranslationsystems.Itcomparestheoverlapofn-grams(contiguoussequencesofwords)inthemachine-generated
textwithoneormorereferencetranslations.Thescorerangesfrom0to1,withhigherscoresindicatingclosermatchestothe
referencetranslations.BLEUemphasizesprecisionbymeasuringhowmanywordsinthegeneratedoutputmatchthereference,
consideringfactorslikebrevityandthepresenceofmultiplereferences.Itiswidelyusedduetoitssimplicityandeffectiveness
inevaluatingmachinetranslationquality.
Numerical translation closeness metric
Corpus of good quality human reference translations
Modified n-gram Precision
BLEU score uses av. log with uniform weights, like geommean of
modified n-gram precision
Doesnotconsider:
Semantic
Sentence structure
morphology

Attention Mechanism
•allowing models to selectively focus on the most relevant information within large
datasets, thereby enhancing efficiency and accuracy in data processing.