PLan-V_presentation_ILSP_PLan-V_project.pdf

ssuserc31d41 8 views 24 slides May 09, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Language Technology applications developed by the Institute for Language and Speech Processing (ILSP) / Athena RC for the support of individuals with speech and language impairments.


Slide Content

Language technologies to
support individuals with
communication disorders
1
Pepi Stamouli, PhD
Linguist, Specialized Scientific Personnel in Language Technology
Institute for Language and Speech Processing-ILSP

2
Language technologies in
speech & language pathology
Language technologies, such as speech and natural language
processing technologies, are valuable tools for the assessment
and rehabilitation of people with communication disorders.
They are used for:
early detection of language deficits,
individualized intervention,
monitoring the progress of the intervention.

3
Applications of language technologies
in speech & language pathology
Automatic assessment of severity and type of aphasia
Automatic classification of types of Primary Progressive Aphasia (PPA)
Early diagnosis of Alzheimer's disease
Advantages:
Patients: Early detection of type and severity of language deficits
Speech therapists: Saving valuable time, as the initial evaluation and re-evaluation
process with standardized protocols is time-consuming and laborious.

4
User needs
Language Technologies for individuals
with aphasia – the PLan-V project
Chronic neurogenic communication disorders, such as aphasia, are increasingly relevant for
a significant percentage of the adult population, given the current aging rate.
They have a severe effect on quality of life, since they limit everyday communication.
The effective support of adults requires individualized, systematic, and regular intervention
by speech and language therapists (SLTs).
However, limited access to free, long-term SLT services hinders patients’ early assessment
as well as their effective rehabilitation.

5
RESEARCH-CREATE-INNOVATE (project code: T2EDK-02159)
Project PLan-V: A Speech and Language
Therapy Platform with Virtual Agent
Aims:
to offer patients with speech and language impairments the opportunity
for remote long-term, high-quality speech and language therapy in their
own environment, without the physical presence of an SLT
to assist SLTs in the design and remote administration of individualized
intervention plans that complement face-to-face clinical sessions
to reduce SLT’s time and effort for:
initial assessment of impairment
monitoring patients’ progress
evaluating intervention outcomes

6
The PLan-V platform

7
ILSP's contribution:
The PLan-V platform
Development of an automatic system for the
assessment of aphasia severity
Development of an automatic evaluation system for
speech production in naming tasks
Development of an audiovisual synthesis system for a
virtual assistant

8
Acquisition of speech data for the automated
classification system of aphasia
*
severity
Patients recruitment
PWA of at least 3 months post
onset, no cognitive impairment
01 02 03 04
Data collection
in-person or remotely via a
web-based application
Patients assessment
for aphasia type and severity
Data transcription
*Aphasia is an acquired language disorder, as a result of focal damage to the left cerebral hemisphere, caused most commonly by a stroke or a brain injury. It can affect the
production and comprehension of both spoken and written language, at all language levels and to varying degrees of severity.
•Non-fluent aphasia is characterized by a slow, fragmentary, and tedious speech, which includes short utterances that contain many disfluencies, giving the impression of
“telegraphic” speech. Comprehension is preserved.
•Fluent aphasia is characterized by overproduction of words. Language lacks semantic content, and hence becomes incomprehensible. Comprehension is affected.

9
Patients' recruitment
and assessment
Speech samples from 28 PWA and 10 neurotypical adults were collected
Data were collected by trained SLTs from PWA with different types and
degrees of aphasia severity, and from controls
Type and severity of aphasia was assessed with the standardized Greek
version of the Boston Diagnostic Aphasia Examination (BDAE) protocol
Some tasks from the Western Aphasia Battery (WAB) were added to the
protocol.
Each patient’s sample acquired a label for aphasia type and severity (Aphasia
Quotient-AQ score): mild, moderate, severe, very severe

Data collection
10
Custom web application for the
facilitation of data collection from PWA
in both in-person and remote settings
Integrates a 7-tasks protocol for
discourse elicitation (4/7 tasks are
adopted from the AphasiaBank
Protocol)
Recordings are being stored per
participant, task and session

Evaluation of the data
collection method
11
Spoken discourse data was collected from 10 neurotypical
participants in two conditions, face-to-face and remote.
Oral narratives were analyzed for 10 linguistic features to
investigate whether the condition of administration affects
language production.
No significant differences were found for 66/70 statistical tests
performed.
The data collection method and the results of its evaluation
were published in Frontiers in Communication
(https://doi.org/10.3389/fcomm.2023.919617).

Data transcription
12
A subset of the collected data from PWA
were transcribed and annotated following
the CHAT transcription format
(AphasiaBank)
Transcriptions are time aligned and
segmented into utterances
The transcriptions and audio files of the
annotated datawere contributed to
AphasiaBank(AphasiaBank Protocol Greek
PLan-V).

13
Automatic Aphasia Severity
Assessment System
Machine learning system that has been trained to be
able to predict the severity class to which a sample of
spoken discourse belongs.
The system is trained in spoken discourse data that
already carry one of four severity "labels" (mild,
moderate, severe, very severe) (training data).
As a result of the training, the model maps each label
to a set of linguistic features that differentiate each
class from the others.
Based this "knowledge" the system can predict -with a
certain degree of accuracy- to which category a new
sample of speech belongs.

14
However…
Voice
Recording

15
Cross-lingual aphasia
detection
Aphasia speech data for model training is hard to obtain, scarce in low-resource
languages such as Greek
We needed an end-to-end pipeline involving Automatic Speech Recognition, to
eliminate the need for human annotation.
We used high-resource language data from languages such as English to train our
model.
We used our limited amount of collected and annotated Greek data to fine-tune and
test our model.

Cross-lingual aphasia
detection
16
Our work was presented at the INTERSPEECH 2022
conference with application to the automatic detection of
aphasia (https://arxiv.org/abs/2204.00448)
An End-to-End workflow was developed for the detection of
aphasia, using “language-agnostic” features.
The model was trained in English.
Aphasia detection was achieved in Greek with 95% accuracy
when manual transcriptions were used and with 80%
accuracy without manual transcriptions, based only on
automatic voice recognition.

Application of language
deficit severity assessment
17
It is integrated into the PLan-V platform.
Patients produce oral narratives based on the protocol of
7 elicitation tasks.
The system automatically assigns a label for aphasia
severity.
The system provides the SLT with additional
measurements of basic features of oral discourse, such as
fluency, lexical diversity, information density and syntactic
complexity.

Application of language
deficit severity assessment
18
Provides automatic transcription
of patient's speech.
Supports different
administrations at different
times.
Visualizes the assessment results
in different administrationstimes,
so that the SLT can monitor the
progress and evaluate the
therapy outcome in spoken
discourse.

19
System for audiovisual
speech synthesis
Research supports the use of virtual assistants in educational or clinical
applications.
Their usefulness lies in the fact that they add a human dimension to the
interface with the computer, which:
improves user engagement
encourages their participation
improves accessibility
In this context, all STL exercises of the PLan-V platform are supported by a
virtual assistant, i.e. a digital character (Avatar), which makes the digital
environment of virtual sessions more friendly, easier to use and accessible.

PLan-V audiovisual synthesis
system
20
A male talking figure of a virtual assistant
with an audiovisual synthesis system was
implemented.
It uses natural pre-recorded and synthetic
male voice as input.
Provides:
Instructions for navigation in the platform
Instructions for performing exercises
Verbal stimuli of the exercises (e.g. words to be
matched with a picture)
Feedback on right and wrong answers

Methodology of audiovisual
synthesis system
21
The approach MakeItTalk
was adopted
Input: an audio file and a
portait image (photo or
sketch)
Output: Synchronized
talking figure animation

Methodology of audiovisual synthesis
system: a closer look
22
Exports facial landmarks
(eye, nose, mouth area)
Exports audio content
representations
Predicts face milestone shifts
from audio content
Renderspredicted landmarks
back to the avatar

API: System for audiovisual speech
synthesis
23
English to Avatar
Greek to Avatar
Pre-recorded audio upload

Thank you!
Pepi Stamouli, PhD | [email protected]
Linguist, Specialized Scientific Personnel in Language Technology
Institute for Language and Speech Processing-ILSP
Tags