Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf

accessinnovations 452 views 48 slides Jun 04, 2024
Slide 1
Slide 1 of 48
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48

About This Presentation

During the May 2024 SSP Conference in Boston, MA, Margie Hlava gave this presentation during the Industry Breakout Session on May 29, 2024.


Slide Content

Supercharge
Your AI
Marjorie M.K. Hlava
Chief Scientist
Access Innovations, Inc.
[email protected]
www.accessinn.com

Marjorie M.K. Hlava
•Expert in taxonomies, metadata, their application and data science.
•Her groundbreaking work has earned her numerous awards and 2 patents
with 21 claims granted
•Margie standards work includes
•Dublin Core Z39.85,
•DOI Syntax Z39.84, ,
•CrEdit Z39.104,
•ThesaurusANSI/NISO Z39.19Thesauri and other controlled vocabularies
•many others.
•Currently Convener of the ISO -25964 the International Standard on
Controlled Vocabularies
•Founder, Chairman, ChiefScientist of Access Innovations, Inc.

”large language models will not only mirror but magnify any problems with the
data sets, problems that many organizations may not realize they have."
Amplifying hidden biases and gaps seems like a real danger

We have content we
want to “slice and
dice” to create new
derivative products
We need to sort 1,000s
of journal
article/conference
session submissionsWe need web
site navigation
We have content
that people can’t
find
We need to find
peer reviewers
We need to
personalize
conference sessions
Departments have
different vocabularies,
don’t talk to each other,
data is siloed and work
is duplicated
And Now… AI…
Large Language
Models
ChatGPT?

What’sdifferent now??

Size is here
Server Farms
Power is a Concern

•Data –well enriched is
the key to
•Ontologies
•Search excellence
•Knowledge maps
•Knowledge graphs

•Datais the LLM core asset
•Without the data the rest of the initiative is nothing
•It is the essential component the strategy
•Do enrichment metadata
•SUBJECT metadata
•Use taxonomies, ontologies, and other models.
•The large language models will not only mirror but magnify any
problems with the data sets, problems that many organizations may
not realize they have. (Gary Carlson -Factors)

Technology
•It is a tool, not the focus
•Might need shiny new piece of technology,
•the technology is generally in the chorus
•not a main character
•Too many companies lead with technology and
•do not spend the time understanding their users or aligning
their strategy.
•Any company that has 1000s of Sharepointor Teams sites where
people still can’t find the information they need knows this.
•Most large corps have 5 search software systems
•On the shelf
•“does not work”
•Because the data was not enriched

Governance and modeling
•Taxonomy and data modeling
•essential component of this investment.
•Data must be
•well sourced,
•managed
•Consider for ethical and performance reasons
•Ignore data quality at your peril
•It is hard work –
•Does not fit two-week sprint
•Get executives to agree on strategy and structure model
•Without a coherent model, governance, data pipeline, and resourcing
there is no strategic value to an AI initiative

Gartner says: “By 2024, companies that use graphs
and semantic approaches for natural language
technology projects will have 75% less artificial
intelligence technical debt than those that do not"
How??
•Use existing standards, schemas and ontologies as starting points.
•Extract a list of key terms that need to be modeled using data
mining/entity extraction/data profiling tools.
•Add handcrafted rules, entity attributes and relationships from
business glossaries and data dictionaries.
•.” [1] Gartner report, “How to Build Knowledge Graphs That Enable AI Driven Enterprise Applications”, 27 September 2022, ID G00768041, AfrazJaffri.

•Does the data need to be
structured for AI?
•No
•Takes in text images, sounds in
all formats
•Do I need a new platform to offer
my content in Ai?
•No
•How can I ensure searchers will not
get hallucinations and wrong
answers?
•Guide it –tag it
•Do I need to protect my content?
•Yes
•When does this happen in the
workflow?
•Early as you can

What’s the process?
•Need a controlled vocabulary
•Keywords, taxonomy terms, entity
identification
•Apply it to your data
•Automatically if possible
•Use the power of the LLM
•Keep your data separate

How it trains
•Learns from itself –the written
resources
•Longer it is used –more accurate it
becomes
•Learns from interactions
•Studies the grammar
•Analyze the sentence
•Order of words
•Possible meanings
•How they fit together
•THEN make a prediction
•Continuations one word at a time
–dependent clauses
•Looks human in response

Dump in the data to the AI vortex
No work needed
Everything will be fine….

Bludgeon your data
Bludgeon your data

Taxonomy Priority (Semantic) Enrichment

LLM’s Need a Little Help
•To be accurate
•To avoid hallucinations
•Enhance the data
•Tag it with controlled terminology
•Add synonyms
•Suggest structure

“AI” + GenAI
•Start with enriched content
(tagged)
•Tell (feed to) GenAI
•GenAIputs new rules in the
inference engine
•Search results get better
•Repeat, repeat, repeat

Sounds too easy ---okay here’s more detail
•Taxonomies
•Available Knowledge domains
•Links to
•Knowledge graphs
•Ontologies
•Why the tagging?
•Is it expensive / time consuming?
•What about protecting my content?

What Are The Steps To Implement
Knowledge Domains In Generative AI?
•Define the Taxonomy Structure
•Collect and Preprocess Data
•Tag your Data with Taxonomy concepts
•Train the Generative AI Model
•Incorporate Taxonomy into Model Inference:
•Evaluate and Iterate
•Deploy and Monitor
•Add the SME’s
•Collaboration between domain experts,
data scientists, and AI engineers is crucial
for the success

Why a taxonomy?
•Matches your content
•Scales with the content increases
•Extensive synonymy –use any of the word term options
•The concept is the unit of thought
•Disambiguation
•Mercury
•Lead
•Built in feedback loops to keep current with content
•Prevents hallucinations
•Misunderstandings of multiple word meanings (Nonsensical output)
•Happens when the model is not trained on your content (Factual contradiction)
•Query goes against the rules of the system (Prompt contradiction)

How Can Knowledge
Domains Help LLM?
•Understanding Input
•Content Organization
•Knowledge Representation
•Query Expansion
•Quality Control
•Content Guides

Knowledge Domain
•Refers to a specific area or field of
knowledge
•subject matter, concepts, theories,
methodologies, and practices.
•Cohesive and organized body of
knowledge with a scope and boundaries.
•Vary widely in size and complexity,
•Established disciplines or sub-disciplines,
•theories, methods, and research traditions.
•Frameworks within specific areas
•Scholars, researchers, practitioners, SME’s
•Thesaurus with a rule base

Knowledge Domains
•Taxonomies, thesauri, or authority files
•Pre-Built
•Knowledge Domains
•Full term records
•hierarchical, equivalence, and
associative relationships, as well as
scope notes where appropriate.
•Hierarchy alone
•NISO Z39.19 and ISO 25964 compliant
•Formats,
•22 options, Excel/CSV, 6 flavors of
SKOS, HTML, XML, SSL, etc.

Applied Science
Art
Behavioral Science
Biological Science
Business
Chemical –MAI Chem™
Communications
Computer Science
COVID
Economics
Educational Curriculum
Geography
Health and Safety
Health Science
History
Information Science
Language Arts
Law
Linguistics
Literature and Drama
Mathematics
NewsThes
Nursing
Philosophy
Physical Education and Recreation
Physical Sciences
Political Science
Psychology
Religion
Science
Social Sciences
General Purpose Taxonomies

These products can be SKOS downloads
Astronomy
Clinical Drugs
DTIC –Defense Technical Information Center
Environment –GEMET
ERIC –Education Resource Information Center
JSTOR
NASA
National Agricultural Library
Occupational Safety and Health
PLOS

CPT –Current Procedural Terminology
HCPCS –Healthcare Common Procedure
Coding System
ICD11 –International Classification of
Diseases
Kew Medicinal Plant Names (MPNS)
MeSH –Medical Subject Headings
Suspect Cell Lines
Taxogene–the Human Geonome
These products are available
as SaaS

Available =
already built
•Government resources
•Most agencies
•May need formatting
•NASA, DTIC, DOE, NAL, EPA, NLM
etc
•Sign up for updates
•License-able
•Taxobank
•Access Innovations
•others

Why tag / index
at all?
•Disambiguation
•Search and retrieval is accurate
•Promote taxonomy term first
searching
•In the inverted index
search controlled terms
first
•Then go to full text if
needed
•Use in search response
consistency and integrity
•Recommendation engines
using tag sets not vectors

Why Auto Tagging?
•Fast
•Sub-second versus 70 seconds per tag
•Able to add more tags quickly in same sub second time
•More depth
•Always goes to the most specific level of tagging
•No misspellings
•Consistency
•No editorial drift –people tend to use same tags over and over
•Do not need as many subject experts
•Replicable results –no black box

Adding Knowledge bases
•Using your own data
•But not depositing into the big LLM’s
•Send the same query to your own content
•Use the same terms
•Answer will be consistent since it is on tagged actual text
•Keeps your data out of the LLM and secure
•Use the LLM to get a general answer
•Use your content to get the specific and reliable answer
•Combine the two to get a quick summary of the material
Do not need to XML structure, but do need to tag

Problems with Chat systems using LLM
•Flooding of the system
•Irrelevant responses
•Lack of answer precision
•Answer
•Fine tuning the system
•Continuous updates
•Identifying the key points of problems
•Handling multiple target points simultaneously
•More focused approach to handling queries
•How?
•Keywords from the taxonomy
•Applied as an incoming filter
•Added to content responses
•Constant additions based on logs

Query Parser
Grammer translation
ChatBox
User Query
Client data
LLM System
Algorithms
Training sets
Enriched Custom Data Set

Can Taxonomies
Supercharge
your AI?
•Guiding Decision-Making
•Enhancing Understanding
•Improving Consistency
•Facilitating Interpretability
•Supporting Compliance
YES!!

Product Descriptions
Knowledge
Domains
Semantic Fingerprinting
Meta-Titles
Content repository
XML Intranet System
Managed Services
Author Disambiguation

Ready to get
started?
•Marjorie M.K Hlava
•Chief Scientist
•Access Innovations, Inc.
[email protected]
Booth 215

END

Features Include:
üTaxonomy and Thesaurus Editor
üTerm Suggestions
üSubject Classification
üEntity Extraction
üConcept Extraction
üMetadata Enrichment
üSentiment Analysis
üAbstracting and Indexing
üText Analyzer and Summarization
üInline Tagging for Enhanced Search
üSemantic Fingerprinting
üLinked Data Management
Data Harmony is our patented, award winning, Artificial Intelligence Suite that
leverages explainable AI for efficient, innovative and precise semantic discovery of
your new and emerging concepts to help you find the information you need when you
need it.
IMPROVING SEARCH RESULTS BY
OVER 90% AND INCREASING
CUSTOMER PRODUCTIVITY BY 7X

Thesaurus with term records

A Knowledge Graphs

Does a knowledge graph need a controlled
vocabulary?
Yes
•Consistency
•Interoperability
•Facilitates Search and Discovery
•Semantic Enrichment
•Domain Understanding

Radial graph
and
Hierarchical
display
Both are
taxonomy
displays
https://www.hedden-information.com/taxonomies-vs-ontologies/

Does an ontology need a controlled vocabulary?
Yes
•Build the term / concept records (Objects, Subjects)
•Define the relationships (some of the Predicates)
•Tag the content
•Flow the taxonomy to the ontology
•Add Axioms or Constraints
•Add more Predicates
•Launch the maps and graphs