Dimensions of Semantic Change: Applying the SIBling Framework to Mental Health Concepts
NaomiBaes1
90 views
47 slides
Sep 16, 2025
Slide 1 of 47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
About This Presentation
Dimensions of Semantic Change: Applying the SIBling Framework to Mental Health Concepts
Invited by: Dr. Pablo Mosteiro Romero: https://www.uu.nl/staff/PJMosteiroRomero
Location: Natural Language and Text Processing Lab, University of Utrecht: https://nlp.sites.uu.nl/
Promotion: https://nlp.sites.uu...
Dimensions of Semantic Change: Applying the SIBling Framework to Mental Health Concepts
Invited by: Dr. Pablo Mosteiro Romero: https://www.uu.nl/staff/PJMosteiroRomero
Location: Natural Language and Text Processing Lab, University of Utrecht: https://nlp.sites.uu.nl/
Promotion: https://nlp.sites.uu.nl/2025/09/11/nltp-content-meetings-september-16-dimensions-of-semantic-change-applying-the-sibling-framework-to-mental-health-concepts-by-naomi-baes/
Title: Dimensions of Semantic Change: Applying the SIBling Framework to Mental Health Concepts
Abstract: Lexical semantic change takes many forms, yet existing approaches often examine them in isolation. I present SIBling, a three-dimensional framework for modelling semantic change through shifts in (1) Sentiment (the valence of a word’s contexts), (2) Intensity (emotional arousal or the use of intensifiers), and (3) Breadth (the diversity of contexts in which a word appears). Together, these dimensions provide an integrated and computationally efficient way of mapping how the meanings of concepts evolve over time. I illustrate the framework with case studies of mental health related concepts, showing how they have broadened, intensified, or shifted in sentiment across decades of text. These results illuminate cultural trends such as pathologization, stigma, and concept creep, demonstrating how the SIBling toolkit captures capture socially significant conceptual change.
Main publications the presentation is based on:
1. Naomi Baes, Nick Haslam, and Ekaterina Vylomova. 2024. A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1390–1415, Bangkok, Thailand. Association for Computational Linguistics.
2. Naomi Baes, Raphael Merx, Nick Haslam, Ekaterina Vylomova, and Haim Dubossarsky. 2025. LSC-Eval: A General Framework to Evaluate Methods for Assessing Dimensions of Lexical Semantic Change Using LLM-Generated Synthetic Data. In Findings of the Association for Computational Linguistics: ACL 2025, pages 10905–10939, Vienna, Austria. Association for Computational Linguistics.
Size: 3.47 MB
Language: en
Added: Sep 16, 2025
Slides: 47 pages
Slide Content
Dimensions of Semantic Change:
Applying the SIBling Framework to
Mental Health Concepts
Naomi Baes
Melbourne School of Psychological Sciences
Natural Language and Text Processing Lab (University of Utrecht) 16 September 2025
Structure
Background
•Theoretical Linguistic Work – Lexical Semantic Change (LSC)
•Semantic Change Detection
•Concept Creep Theory
Contributions
•SIBling: A Multidimensional Framework for Evaluating LSC
•LSC-Eval: A General Framework to Evaluate Methods for
Assessing Dimensions of LSC Using LLM-Generated Data
•Detailed Case Studies of Mental Health Concepts
Key Takeaways
Lexical Semantic Change
“Semantic change deals with change in meaning, understood to be a
change in the concepts associated with a word…” (Campbell, 1999)
Campbell, L. (1999). Historical linguistics: An introduction (1st MIT Press ed). MIT Press.
Bloomfield, L. (1933). Language. Compton Printing Works Ltd.
1. Narrowing: superordinate’(general) → subordinate (specific) meat ‘all food’ → meat ‘edible flesh’
2. Widening: subordinate (specific) → superordinate (general) dogge ‘of specific breed’ → dog ‘all breeds’
3. Metaphor: transfer of a name based on associations of similarity bitraz ‘biting’ → bitter ‘harsh of taste’
4. Metonymy: change based on meaning’s proximity in space or time ceace ‘jaw’ → cheek ‘fleshy side face’
5. Synecdoche: the meanings are related as whole and part stobo‘heated room’ → stove ‘cooking device’
6. Hyperbole: stronger → weaker meaning by overstatement extonare ‘to strike with thunder’ → astonish ‘…surprise’
7. Meiosis: weaker → stronger meaning by understatement kwalljan ‘to torment’ → cwellan ‘to kill’
8. Degeneration: Positive → negative connotation cnafa ‘boy servant’ → knave ‘a tricky deceitful fellow’
9. Elevation: Negative → positive connotation cniht ‘boy, servant’ → knight ‘man honored by sovereign for merit’
Forms of Lexical Semantic Change
Geeraerts, D. (2010). Theories of lexical semantics. Oxford University press.
Denotational (referential) Meaning Connotational (emotive) Meaning
1.Specialization (or semantic ‘restriction’ and ‘narrowing’)
2.Generalization (or ‘expansion’, ‘extension’,
‘schematization’, ‘broadening’)
1.Metonymy (including ‘Synecdoche’)
2.Metaphor
1.Amelioration
2.Pejoration
Forms of Lexical Semantic Change
Semantic Change Detection
•Computational linguists have developed methods to detect and model semantic change in
historical text corpora (Kutuzov et al., 2018; Tahmasebi et al., 2018; Tang, 2018; Tahmasebi and
Dubossarsky, 2023; Cassotti et al., 2024b; Periti and Montanelli, 2024a; Kiyama et al., 2025).
•These approaches rely on the distributional hypothesis (Harris, 1954): a word’s meaning is
reflected by the contexts in which it occurs.
•Neural models of language learn distributed (vector) representations that operationalize these
distributional semantics (Tahmasebi et al., 2021);
•the current state of the art uses large language models (LLMs): high-capacity, mostly Transformer-
based neural language models that produce contextual embeddings for words and sentences).
These advances have been applied to a range of semantic change processes including:
•Broadening (Vylomova et al., 2019)
•Metaphor (Tong et al., 2021)
•Hyperbole (Badathala et al., 2023; Kong et al., 2020; Schneidermann et al., 2023; Tian et al., 2021)
•Pejoration (Dinu et al., 2021)
Semantic Change Detection
Vylomova et al. (2019) modelled Broadening
as an increase in the cosine distance
between aligned word embeddings from
the time period (semantic displacement).
Vylomova, E., Murphy, S., & Haslam, N. (2019). Evaluation of Semantic Change of Harm-Related Concepts in Psychology. Proceedings of the
1st International Workshop on Computational Approaches to Historical Language Change, 29–34. https://doi.org/10.18653/v1/W19-4704
Semantic Change Detection in Psychology Domain
Concept Creep Theory: Nick Haslam (2016)
Haslam, N. (2016). Concept creep: Psychology's expanding concepts of harm and pathology. Psychological inquiry, 27(1), 1-17.
Concept creep: the tendency for harm
concepts to gradually expand their
semantic boundaries to refer to a wider
range of phenomena
CONCEPT CREEP
Vertical: concept’s meaning
becomes less stringent,
extending to quantitatively
milder variants of the
phenomenon to which it
originally referred
Horizontal: concept extends
to a qualitatively new class of
phenomena or is applied in a
new context
Haslam, N., Vylomova, E., Zyphur, M., & Kashima, Y. (2021). The cultural dynamics of concept creep. American Psychologist, 76(6), 1013.
Horizontal Creep
Outward expansion to include new phenomena
•Meaning extends to add psychological injuries
Vertical Creep
Downward expansion to include milder phenomena
•Traumatic event = distressing + outside the range of
normal human experience
“rape”, “assault”, “military combat”, “natural disasters”
•Reduced severity DSM-III (1980)
•Indirect experiences DSM-III-R (1987)
•Subjective component DSM-IV (1994)
Traumatic event = inclusion of milder items:
“business loss”, “marital conflict”
The Concept Creep of Trauma
‘Physical’ +
‘Psychological’
wound
‘Physical wound’
(late 19
th
century definition)
Trauma’s meaning expanded vertically to become more associated with less emotionally intense
language in a corpus of psychology abstracts (1974-2017).
Beta = -.06, CI[-0.12, -0.01], p = .032
Vertical Concept Creep of Trauma
Baes, N., Vylomova, E., Zyphur, M., & Haslam, N. (2023). The semantic inflation of ‘trauma’ in psychology. Psychology of Language and
Communication, 27, 23-45. https://doi.org/10.58734/plc-2023-0002
Severity (S) index: annual count(xi)-weighted average
of each lemma’s (n) emotional intensity rating (wi) in
the target’s 5-word context window
Gaps in the Literature
1.Psychology lacks sophisticated methods to study semantic change
2.Computational Linguistics lacks a unifying framework to integrate multiple
dimensions of lexical semantic change
Bloomfield, L. (1933). Language. Compton Printing Works Ltd.
1. Narrowing: superordinate’(general) → subordinate (specific) meat ‘all food’ → meat ‘edible flesh’
2. Widening: subordinate (specific) → superordinate (general) dogge ‘of specific breed’ → dog ‘all breeds’
3. Metaphor: transfer of a name based on associations of similarity bitraz ‘biting’ → bitter ‘harsh of taste’
4. Metonymy: change based on meaning’s proximity in space or time ceace ‘jaw’ → cheek ‘fleshy side face’
5. Synecdoche: the meanings are related as whole and part stobo‘heated room’ → stove ‘cooking device’
6. Hyperbole: stronger → weaker meaning by overstatement extonare ‘to strike with thunder’ → astonish ‘…surprise’
7. Meiosis: weaker → stronger meaning by understatement kwalljan ‘to torment’ → cwellan ‘to kill’
8. Degeneration: Positive → negative connotation cnafa ‘boy servant’ → knave ‘a tricky deceitful fellow’
9. Elevation: Negative → positive connotation cniht ‘boy, servant’ → knight ‘man honored by sovereign for merit’
Forms of Lexical Semantic Change
Forms of Semantic Change Mapped onto Dimensions
Naomi Baes, Nick Haslam, and Ekaterina Vylomova. 2024. A Multidimensional Framework for Evaluating Lexical Semantic Change with Social
Science Applications. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
pages 1390–1415, Bangkok, Thailand. Association for Computational Linguistics.
relates to the degree to which a word acquires a more positive
(‘elevation’, ‘amelioration’) or negative (‘degeneration’,
‘pejoration’) connotation
Theoretical Linguistic Framework: SIBling
relates to the degree to which a word’s meaning
changes to acquire more (‘meiosis’) or less
(‘hyperbole’) emotionally charged (i.e., strong,
potent, high-arousal) connotations.
SIBling: Theoretical Linguistic Framework
relates to the degree to which a word acquires a more positive
(‘elevation’, ‘amelioration’) or negative (‘degeneration’,
‘pejoration’) connotation
relates to whether a word
expands (‘widening’, ‘generalization’)
or contracts (‘narrowing’,
‘specialization’) its semantic range
relates to the degree to which a word’s meaning
changes to acquire more (‘meiosis’) or less
(‘hyperbole’) emotionally charged (i.e., strong,
potent, high-arousal) connotations.
SIBling: Theoretical Linguistic Framework
relates to the degree to which a word acquires a more positive
(‘elevation’, ‘amelioration’) or negative (‘degeneration’,
‘pejoration’) connotation
resembles primary dimensions of:
1) human emotion: Arousal (Russell, 2003)
2) connotational meaning: Potency (strong/weak)
(Osgood et al., 1975)
resembles primary dimensions of:
1) human emotion: Valence (Russell, 2003)
2) connotational meaning: Evaluation (good/bad) (Osgood et al.,
1975)
Osgood, C. E., May, W. H., & Miron, M. S. (1975). Cross-Cultural Universals of Affective Meaning. University of Illionois Press.
Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172.
SIBling: Theoretical Linguistic Framework
Example: the word sick
↑ Sentiment: from ill → cool (in slang)
↓ Intensity: serious illness → casual/enthusiastic use
↑ Breadth: from medical term → general praise
SIBling: Theoretical linguistic model of conceptual
change
SIB Toolkit
S
I
B
Valence Breadth
Case Study: SIB Shifts in mental health and illness
Arousal
Case Study: SIB Shifts in mental health and illness
Decreasing Sentiment
points to increasing
stigmatization of MH
and MI
Rising Intensity indicates
the rising
problematization of MH
and MI in recent decades
Rising Breadth
reveals the semantic
inflation (concept
creep) of MH and MI
Case Study: The semantic bleaching of mental illness
Note: 11 adjectival modifiers: “great”, “intense”,
“severe”, “harsh”, “major”, “extreme”, “powerful”,
“serious”, “devastating”, “destructive” “debilitating”
Complementary indices: Salience and Thematic Content
Further analyses connect
these SIB trends to the rising
cultural salience and
pathologization of mental
health and mental illness.
Gaps in the Literature
•The validity of these methods had not been established due to the absence of
historical benchmarks, which are costly and time-intensive.
•There was a need for tailored historical benchmark datasets to evaluate the
suitability of methods to model LSC in specific dimensions and domains.
1)Develops a general-purpose methodology for
generating theory-driven synthetic datasets
that leverage ‘scholar-in-the-loop’ In-Context-
Learning and a lexical database to simulate
changes in kinds of LSC.
2)Uses these synthetic datasets to evaluate the
sensitivity of known methods to detecting
levels of synthetic change.
3)Identifies the most suitable method for specific
LSC dimensions and domains on a synthetic
change detection task.
To address these challenges, we introduced a 3-stage evaluation framework, LSC-Eval, which:
LSC-Eval
Naomi Baes, Raphael Merx, Nick Haslam, Ekaterina Vylomova, and Haim Dubossarsky. 2025.LSC-Eval: A General Framework to Evaluate
Methods for Assessing Dimensions of Lexical Semantic Change Using LLM-Generated Synthetic Data. In Findings of the Association for
Computational Linguistics: ACL 2025, pages 10905–10939, Vienna, Austria. Association for Computational Linguistics.
•We applied LSC-Eval to assess the sensitivity to synthetic change in the three dimensions of
LSC proposed in the “SIBling” framework – Sentiment, Intensity, and Breadth (Baes et al.,
2024) – using six examples drawn from psychology.
LSC-Eval
Stage 1: Synthetic
Dataset Generation
•Synthetic datasets are created
to benchmark changes in LSC
dimensions using few-shot ICL
and WordNet
•For each semantic dimension,
datasets contain up to 15k total
randomly sampled unique
natural sentences as input
(1,500 for each 5-year interval in
the corpus period: 1970-2019)
Stage 1: Generate Synthetic Datasets
Approach for affective dimensions: Employed five-shot ICL (‘scholar-in-the-loop’), using GPT-4o to vary neutral
input sentences to increase or decrease on Sentiment (36,151) and Intensity (39,986).
Stage 1: Generate Synthetic Datasets
Approach for synthetic Breadth: Adapt Dubossarsky et al.’s (2019) replacement strategy, using
WordNet to expand a target word’s usage (recipient term) by incorporating contexts from donor
terms (co-hyponyms), broadening its semantic range without altering its core meaning.
Procedure: filter for the most relevant co-hyponyms for each target term
1.Identify relevant synsets of donor terms by filtering for psychological relevance:
•Keyword matching in synset glosses: 15 psychology key terms
•Semantic similarity thresholds: Lin similarity (0.5) and cosine similarity (0.7)
2.Sibling Replacement process:
1.Get sentences containing the co-hyponym: Round-robin strategy to sample representatively from sibling list
(1,500 sentences per 5-year interval)
2.Replacement: Replace the co-hyponym in the donor sentence with the target (recipient).
Stage 1: Generate Synthetic Datasets
Stage 2: Evaluate Methods and Synthetic Data
Synthetic Sentence Injection: Inject synthetic sentences into samples of up to 50 sentences at increasing
injection levels: 20%, 40%, 60%, 80%, 100%.
Figure. Distribution of Sentences in Each Sample of up to 50 Sentences.
Sampling Strategies for different experimental scenarios:
1.Bootstrap sampling
•Procedure: 50 sentences x 100 samples; bin = synthetic injection level
•Function: Controlled randomness (reflecting underlying statistical properties of data)
2.Five-Year Random Sampling
•Procedure: 50 sentences x 10; bin = synthetic injection level and 5-year intervals
•Function: Ecologically valid (reflects natural language)
3.Control conditions (Dubossarsky et al., 2017, 2019)
•Procedure: Shuffle sentences to balance natural and synthetic sentences per sample
•Function: Verify the absence of a synthetic effect in shuffled samples
Stage 2: Evaluate Methods and Synthetic Data
Stage 2: Evaluate Methods
Sentiment
•Valence index (Baes
et al., 2024)
Intensity
•Arousal index (Baes
et al., 2024)
Breadth
•Breadth Score
(MPNET; Baes et al.,
2024)
Stage 3: Select the Best-Performing Method
Sentiment
•Valence index (Baes
et al., 2024)
•Sentiment score
(Deberta-v3-base-
absa-v1.1 Aspect-
Based Sentiment
Analysis
classification model)
Intensity
•Arousal index (Baes
et al., 2024)
Breadth
•Breadth Score
(MPNET; Baes et al.,
2024)
•Breadth Score
(WiC sentence
embeddings from
XL-LEXEME; Cassotti
et al., 2023)
General LSC
•SOTA LSC score (XL-
LEXEME; Cassotti et
al., 2023)
Average Pairwise
Cosine Distances
between sentence
embeddings from 2
time periods
Recently proposed SIB measures
were validated: they are sensitive
to detecting injected levels of
synthetic change.
β+ = .61*, CI[.57,65]
β- = -.31*, CI[-.36,-.25]
β+ = .64*, CI[.59,.69]
β- = -.64*, CI[-.70,-.59]
β+ = .43*, CI[.32,.54]
S
I
B
Figure. SIB Scores (±SE) by Injection Levels for Bootstrapped Setting.
S
I
B
20% injection level
100% injection level
20% injection level
20% injection level
100% injection level
100% injection level
100% injection level
100% injection level
20% injection level
20% injection level
SIB toolkit was even sensitive
to detecting injected levels of
synthetic change across five-
year intervals.
Figure. SIB scores by five-year intervals across
injection levels and conditions.
Controlling for synthetic
injection level by randomly
shuffling sentences revealed
no synthetic change effect.
Results validate the LLM-
generated sentences in
evaluation sets & SIB tools.
Figure. SIB Scores (±SE) by 50% Injection Levels and
Conditions: Control Setting for Bootstrapped Setting.
Figure. SIB Scores (±SE) by 50%Synthetic
Injection Levels and Conditions: Control
Setting for Five-Year Samples.
Which out of a set of LSCD methods is most
sensitive to synthetically induced changes in SIB?
Which out of a set of LSC
detection methods is most
sensitive to synthetically
induced changes in SIB?
Most sensitive method for detecting changes in:
•Sentiment: Sentiment Score (ABSA) – 10/12
•Intensity: Arousal Index – 12/12
•Breadth: Breadth Score (XLL) – 4/6
LSC score (XLL) was completely insensitive to
detecting synthetic changes in either semantic
Sentiment or Intensity.
Figure. Relative Change (∆%) Scores for Models
Across Dimensions and Conditions: Bootstrapped.
Detailed Case Studies: Semantic Shifts in Mental Health Concepts in
the US News (1980-2025)
Schizophrenia
S
I
B
ADHD
[A.D.H.D.
Attention Deficit Hyperactivity Disorder
Attention-Deficit Hyperactivity Disorder
Attention-Deficit/Hyperactivity Disorder
Attention Deficit/Hyperactivity Disorder
A.D.D.
Attention Deficit Disorder
Attention-Deficit Disorder]
Key Takeaways:
•SIBling offers a validated computational toolkit for illuminating semantic and
cultural dynamics.
•LSC-Eval successfully generates synthetic historical benchmark datasets to
evaluate the suitability of methods for specific dimensions and lang. domains.
•Multidimensional modelling of semantic change makes case studies of
conceptual change more revealing.
Acknowledgements
Nick Haslam Ψ, Ekaterina Vylomova λ, Haim Dubossarsky ΦTΣ•, Raphaël Merx λ
Ψ Melbourne School of Psychological Sciences, The University of Melbourne
λ School of Computing and Information Systems, The University of Melbourne
Φ School of Electronic Engineering and Computer Science, Queen Mary University of London
T The Alan Turing Institute, London
Σ Language Technology Lab, University of Cambridge
This research was supported by an Australian Government Research Training Program Scholarship and funded, in part, by Australian
Research Council Discovery Project DP210103984 and by the research program "Change is Key!", supported by Riksbankens
Jubileumsfond (M21-0021).
PhD Supervisor PhD Co-Supervisor Collaborator Collaborator
Thank you.
Get in touch! [email protected]
https://naomibaes.github.io/
https://www.linkedin.com/in/naomibaes/
Questions?
Psychology Article Abstracts
•~870,000 abstracts (1930—2019)
•875 journals (PubMed, E-Research)
•Developmental & Educational Psychology:
24.7%
•Clinical psychology: 19.5%
•Social psychology: 17.8%
•Psychology (Miscellaneous): 16.5%
•Applied Psychology: 11.3%
•Experimental & cognitive psychology: 5.1%
•Neuropsychology & physiological
psychology: 5.1%
•>130 million words
CoHA+CoCA (General US English text)
•~370,000 texts (1810—2019)
•Types of text
•Fiction Books: 10%
•Magazines: 36%
•Newspapers: 31%
•Non-fiction Books: <1%
•Spoken language: 16%
•TV shows: 7%
•~930 million words
Corpora: Academic Psychology and CoHA/CoCA