Mattingly "AI & Prompt Design: Named Entity Recognition"
BaltimoreNISO
684 views
29 slides
May 14, 2024
Slide 1 of 29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
About This Presentation
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fifth segment of the NISO training series "AI & Prompt Design." Session Five: Named Entity Recognition with LLMs, was held on May 2, 2024.
Size: 350.5 KB
Language: en
Added: May 14, 2024
Slides: 29 pages
Slide Content
Prompt Design 05: Named Entity Recognition
Named Entity Recognition (NER) as a Concept Rules-Based Approaches to NER Supervised Learning NER Unsupervised Learning NER Transformer-Based NER GliNER Large Language Models NER Goals
What is NER?
John went to Paris on 1 August 2023.
Named Entity Recognition John went to Paris on 1 August 2023 . John => PERSON Paris => LOCATION 1 August 2023 => DATE
Non-LLM Approaches to NER
Traditional Approaches Rules-Based Task-Specific Machine Learning Model Unsupervised Learning GliNER (Brand new!)
Rules-Based NER
Traditional NER Gazetteer Linguistic Rules Nested Conditions RegEx Rules-Based
Rules-Based List of Entities Concentration Camps: Auschwitz Bergen-Belsen Buchenwald … Gazetteer
Rules-Based Leverages the linguistic data of a text to assign an entity. Use an NLP framework, like spaCy or NLTK Nearly two hundred of them were taken to Berlin. Verb of movement followed by a proposition(s) [to, towards, away to] and a location. Linguistic Rules
Rules-Based Find conditions in which things occur to then assign a label. We were taken to the Warsaw Ghetto. If an entity is a LOCATION and the word “ghetto” appears within a context of 5 tokens, change entity to GHETTO. Nested Conditions
Rules-Based Regular Expressions is a complex way of doing fuzzy string matching. Hic pagus unus, cum domo exisset, patrum nostrorum memoria L. Cassium consulem interfecerat et eius exercitum sub iugum miserat. Lucius Cassius (?:[A-Z]\.\s)?Cassi(?:us|um|i|o|orum|is) RegEx
Machine Learning NER
Machine Learning { "text": "John Doe was a prisoner at Auschwitz during World War II.", "entities": [ { "type": "PERSON", "value": "John Doe", "start_pos": 0, "end_pos": 8 }, { "type": "CONC_CAMP", "value": "Auschwitz", "start_pos": 20, "end_pos": 30 } ] } Supervised Learning
Machine Learning Vectorize all multi-word tokens Plot them to identify patterns Exercise: https://wjbmattingly.com/unsupervised-ner/ Uns upervised Learning
Machine Learning GliNER => A transformer architecture that allows you to pass a text and your own labels to a model without any training. Example: https://huggingface.co/spaces/tomaarsen/gliner_medium-v2.1 Zero-Shot NER
LLMs Resource Intensity (and Cost) Data Privacy Concerns Black Box Models Training Data Bias Generalization Challenges Latency Issues Hallucinations Consistency Limitations
LLMs Thinking through your methodology for NER Assisting in certain steps of NER (RegEx) Zero-Shot NER Few-Shot NER How to use LLMs
Exercise 1: Use an LLM to help develop a solution(s) to identify gender-specific people in a text. Discuss the options as a group and judge their merits. Consider the ethical implications of the proposed solutions.
Mrs. Jessica Monica Kapitan works at the office. Mrs. Kapitan is a lawyer. She is also friends with Mrs. Thompson and Miss. Smith. Sometimes Miss. Smith will miss her train.
Exercise 2: Capture all examples of Miss. and Mrs. in the text with their corresponding names using an LLM to generate RegEx https://regex101.com/r/TLfbGE/1
Exercise 1: One Solution \b(Mrs\.|Miss\.)\s+([A-Z][a-z]*(?:\s+[A-Z][a-z]*)*)
Mr. Thomas and Dr. Jessica Davis went to the store. They met Mrs. Stevens who works at a nearby office. They are all friends with Colonel Jackson. Col. Jackson is known to her friends by her first name, Terry. They all know Mr. and Mrs. Kapitan.
Exercise 3: Capture all examples [Honorific Entity] in the text with their corresponding names using an LLM to generate RegEx https://regex101.com/r/FYcO8C/1
Exercise 3: One Solution \b(Mr\.|Mrs\.|Miss\.|Dr\.|Colonel|Col\.)\s+([A-Z][a-z]*(?:\s+[A-Z][a-z]*)*)
Exercise 4: Use an LLM to identify the people in the following text. Think through an ethical way to use an LLM to assign potential gender in these contexts. Dr. Tracey Jordan works at the Smithsonian where he develops methods to identify named entities. Mrs. Alex Jackson leads the team. She was trained in machine learning at Stanford. While Tracey functions as the domain expert, Alex Jackson designs the experiments. They have another colleague, Leslie Peters.