Machine learning for natural language understanding

HaiderBukhari14 15 views 30 slides May 02, 2024
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

ML$NLU


Slide Content

Evaluation Methods for Unsupervised Word Embeddings Group# ML4NLU-4b Trends in Machine Learning (Seminar: Machine Learning for Natural Language Processing)

Basis Finding Measures Different evaluation method results in different ordering: Questions the assumption of optimal vector representation. Evaluate: How outcome from evaluation techniques are connected . Reduce bias for insights through crowd-sourcing for different models. Relatively less available work on: direct evaluation of models. Study connections between direct evaluation with real users and pre-collected offline data . Query inventory are: often idiosyncratic, dominated by specific queries, poorly calibrated to corpus Statistics. Construct new inventory : Model and data driven approach to construct query inventories. Frequency calibrated inventory with user judgments. Observation: Word embeddings encode a additional information . 23-11-2023 2

Understanding the Fundamentals Types of learning Word Embeddings. Evaluation Techniques 23-11-2023 3

Types of learning 23-11-2023 4

Word Embeddings Numerical representation of words. 23-11-2023 5

Classification of Evaluation Schemes Intrinsic Evaluation Extrinsic Evaluation Test for syntactic or semantic relationship . Word embedding as input features to a downstream task. Measure changes in performance metrics to task . Task involve preselected query terms and semantically related target words. Provides the way to specify the goodness of embedding. 23-11-2023 6

Evaluation Schemes: Intrinsic Evaluation.(Two Scenarios) Absolute Comparative Embeddings are evaluated individually. Final scores are compared using dataset. People are asked directly for their preference among embeddings. No need to define metric that compare score word pairs. Instead of choosing query & target words , only choose query words. Embedding themselves will define comparable target words. 23-11-2023 7

Embedding Model Workflow of word embedding learning 23-11-2023 8 Probabilistic prediction Approach based C&W ( Collobert and Weston) CBOW Select A model Training Corpus Training Algorithm Word Embedding Evaluation Algorithms Extrinsic Evaluation Sentiment Classification Noun Phrase Chunking, POS-Tagging Named Entity Recognition( NER ) Absolute Relatedness Analogy Categorization Selectional Preference Evaluation Data Set Evaluation Result Reconstruction Approach based GloVE Hellinger Principle Component Analysis(H- PCA ) Rand. Proj . TSCCA (Two Step Canonical Correlation Analysis) Intrinsic Evaluation Comparative

Absolute Evaluation How similar (on a scale from 0-10) are the following two words related ? Answer: 5.62 (According to WordSim ) Problem - Large Variance How can we improve this ? 23-11-2023 9 Tiger Fauna

Procedure Design for intrinsic evaluation Which option is most similar to the query word ? 23-11-2023 10 Query : Skillfully Swiftly Cleverly Expertly Pointedly

Procedure Design for intrinsic evaluation Comparative Evaluation 23-11-2023 11 Query Inventory Embedding 1 Embedding 2 Embedding 3 Judgements Advantage : Directly shows people’s preference Provide relative instead of absolute judgements

Normalized scores 23-11-2023 12

Comparative Result 23-11-2023 13

Comparative Result 23-11-2023 14

Definition of coherence Coherence is the connection brought about by reader’s/listener’s knowledge that helps him/her to understand any given discourse (e.g. through the knowledge of the context in which the discourse is unfolding). It is what makes a text semantically meaningful 23-11-2023 15 An octopus is an air-filled curtain with seven heads and three spike-filled fingers, which poke in frills and furls at ribbon-strewed buttons.

Coherence II Good embeddings should have coherent neighborhoods for each word . It is necessary to assess if groups of words in a small neighborhood in the embedding space are mutually related. 23-11-2023 16

Coherence III Outlier detection precision of different embedding algorithms Order of last 3 embeddings remain unchanged, showing strong correlation between direct comparison task and intrusion task 23-11-2023 17

Embedding Space An embedding is a low-dimensional space into which you can translate high-dimensional vectors. 23-11-2023 18

Application of word embedding Analyzing Survey Responses Analyzing Verbatim Comments Music/Video Recommendation System Text Classification Document Search and Info Retrieval Language Translation System 23-11-2023 19

CBOW – Demo 23-11-2023 20

Word Embeddings using CBOW 23-11-2023 21

Word Embeddings using CBOW 23-11-2023 22

Word Embeddings using CBOW 23-11-2023 23

Challenges of word embeddings 23-11-2023 24

Homographs & Inflections 23-11-2023 25 Solution - Training word embedding models can use text preprocessed through part-of-speech tagging Solution - Training word embedding models can use text preprocessed through lemmatization

Bias & Antonyms 23-11-2023 26 Solution – Debiasing using neutralize and equalize Solution – Using lexical knowledge that includes identification of antonyms and synonyms during the preprocessing stage

Conclusion Word embeddings are not new but have pushed them into the state of the art of NLP. Embedding should be compared in the context of a specified task e.g.:Linguist insight Direct comparisons between embeddings that provides more fine-grained analysis and supports simple, crowdsourced relevance judgments 23-11-2023 27

23-11-2023 Word embeddings are fun to play with, not so difficult to understand, and very useful in most NLP tasks, so I hope you’d enjoyed learning about them! 28

References [01] T. Schnabel, I. Labutov , D. Mimno , and T. Joachims , “Evaluation Methods for Unsupervised Word Embeddings,” Association for Computational Linguistics, 2015. Accessed: Nov. 2022. [Online]. Available: https://aclanthology.org/D15-1036.pdf [02] Questions.png . Accessed: Nov. 2022. [Online]. Available: https://www.501commons.org/blog/questions.png/view [03] R. Raj, “Supervised, Unsupervised, And Semi-Supervised Learning With Real-Life Usecase ,” www.enjoyalgorithms.com . https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning (accessed Nov. 2022). [04] Shane, “Intro to Word Embeddings and Vectors for Text Analysis.,” www.shanelynn.ie . https://www.shanelynn.ie/get-busy-with-word-embeddings-introduction/ (accessed Nov. 2022). [ 05] A. CR, Word Vectors . Accessed: Nov. 2022. [Online]. Available: https://miro.medium.com/max/640/1*LdviucnshWgIIcQvhTTF-g.webp [ 06] J. Liu, S. Zheng , G. Xu , and M. Lin, Visualization of Word Embedding Space . 2021. Accessed: Nov. 2022. [Online]. Available: https://www.researchgate.net/figure/Visualization-of-the-word-embedding-space_fig4_343595281 [ 07] Y. Shi, Y. Zheng , K. Guo , L. Zhu, and Y. Qu , “Intrinsic or Extrinsic Evaluation: An Overview of Word Embedding Evaluation,” IEEE Xplore , Nov. 01, 2018. https://ieeexplore.ieee.org/document/8637416 (accessed Nov. 2022).  [08] Tobias Schnabel, “Evaluation Methods for Unsupervised Word Embeddings,” Vimeo , Feb. 22, 2016. https://vimeo.com/156340833 (accessed Nov. 2022).   [09] University of Cambridge, “Cohesion and Coherence,” www.english.cam.ac.uk . https://www.english.cam.ac.uk/elor/lo/cohesion/index.html (accessed Nov. 2022). [10] “Coherence and cohesion,” BLOG|ON|LINGUISTICS , Nov. 04, 2013. https://blogonlinguistics.wordpress.com/2013/11/04/coherence-and-cohesion/ (accessed Nov. 2022). 23-11-2023 29

References (contd.) [11] “Coherence (linguistics),” Wikipedia , Dec. 29, 2021. https://en.wikipedia.org/wiki/Coherence_(linguistics) (accessed Nov. 2022). [12] S. Anala , “A Guide to Word Embeddings,” Medium , Oct. 28, 2020. https://towardsdatascience.com/a-guide-to-word-embeddings-8a23817ab60f (accessed Nov. 2022). [13] “Embeddings  |  Machine Learning Crash Course  |  Google Developers,” Google Developers , 2019. https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture (accessed Nov. 2022). [14] M. Eff , “Audiovisual Aesthetics | Interpretive and Experiential Positions,” Medium , Jan. 23, 2022. https://soundand.design/audiovisual-aesthetics-4-33590d82d193 (accessed Nov. 2022). [15] “Word embeddings: how to transform text into numbers,” MonkeyLearn Blog , Dec. 07, 2017. https://monkeylearn.com/blog/word-embeddings-transform-text-numbers/ (accessed Nov. 2022). [16] F. Lotito , “continuous-bag-of-words,” GitHub , Nov. 20, 2022. https://github.com/FraLotito/pytorch-continuous-bag-of-words (accessed Nov. 2022). [17] “Homonyms,” CAD Community Classroom . https://www.cadavies.com/homonyms.html (accessed Nov. 2022). [18] “ WebQuest : English Irregular verb Inflection,” zunal.com . http://zunal.com/introduction.php?w=310481 (accessed Nov. 2022). [19] Bitext , “Main Challenges for Word Embeddings: Part I,” blog.bitext.com . https://blog.bitext.com/main-challenges-for-word-embeddings-part-i (accessed Nov. 2022). [20] “Fairness in Machine Learning,” Science in the News , Jan. 28, 2020. https://sitn.hms.harvard.edu/uncategorized/2020/fairness-machine-learning/ (accessed Nov. 2022). [21] M. Etcheverry and D. Wonsever , “Unraveling Antonym’s Word Vectors through a Siamese-like Network,” Association for Computational Linguistics, 2019. Accessed: Nov. 2022. [Online]. Available: https://aclanthology.org/P19-1319.pdf 23-11-2023 30
Tags