Accent conversion using Deep neural network

sccse 32 views 21 slides Jul 11, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

IIT Kgp


Slide Content

Foreign Accent Conversion using Cycle Generative Adversarial Network or Cycle-GAN Submitted by Sabyasachi Chandra Institute research scholar Roll No – 19ET91R03 Advance Technology Development Centre Under the supervision of Dr. Shyamal Kumar Das Mondal INDIAN INSTITUTE OF TECHNOLOGY, KHARAGPUR

Content of the Presentation : Introduction Literature Survey Research Gaps Research Objectives Proposed Model Data set 7. Future work. Timeline for the proposed work.

Introduction What is Accent? For the sake of simplicity , we can say “accent” to be solely variations in pronunciation in the current work. What is foreign Accent? A foreign accent can be defined as deviations from the expected acoustic (e.g. formants) and prosodic(e.g. intonation, duration and rate) norms of a language. What is foreign accent conversion? Foreign accent conversion aims to create a new voice that has the voice quality of a given non-native speaker and the pronunciation type of a native speaker.

Motivation of the work Source L1 Japanese speaking L2 English Accent Conversion L1 Japanese student L1 American student Very good lecture.

Literature Survey Accent Conversion Articulatory Domain Acoustic Domain

Summery of Different AC Techniques Accent Conversion Method Advantages Disadvantages Accent conversion with FD-PSOLA [7] This accent conversion approach can reduce the perceived foreign accented ness of an utterance (by about two-thirds) while preserving information that is unique to the voice quality of the speaker. Thus, these results support their choice of a spectral envelope vocoder to decompose utterances into their voice quality and linguistic components. The identity ratings proved to be the most interesting as they find that the listeners indicate a ‘third’ speaker. In other words, the converted audio sounds neither like the source or target speaker. The authors conclude that accentedness is reduced by their system, their proposed system also loses the necessary information needed to retain the speaker’s identity Accent Conversion Through Cross-Speaker Articulatory Synthesis techniques. [11],[12] In this method they proposed an articulatory method for accent conversion that does not require L2 articulatory recordings. Collecting articulatory data for each L2 learner is Impractical. Low rating in acoustic quality. Foreign accent conversion through Voice morphing techniques [8],[9],[10] This method comes under the acoustic domain. Higher quality and identity scores are obtained by retaining as much of the learner’s spectral information as possible This methods also suffer in the learner's identity issue.

Accent Conversion method Advantages Disadvantages Accent Conversion using Artificial Neural network models [6],[5],[18] The method is is able to work properly in noisy environments. provides the ability to model non-linearities and dynamics and give results that are highly close to the source signal. Necessary to construct several layers through which desired output can be achived .

G lobal Architecture of Accent Conversion system Training Conversion x y y’ f (.) L2 Speaker uttterances L1 Speaker Utterances L2 Speaker Utterances Speech analysis Feature Extraction Alignment Accent conversion function training Conversion model Conversion Speech Synthesis Accent Converted Speech

RESEARCH GAPS Identity losses is a issue in existing accent conversion techniques. Non- availability of any Realtime Accent Conversion technique.(rt- Accent conversion Technique) 3. Lack of availability of data.

Research Objectives Development of cross lingual model for Accent Conversion using Cycle GAN. Experimental studies and model validation. Performance comparison between Proposed model and other existing models.

Generative Adversarial Network Model Architecture Update Update model model Random Input Vector Generator Model Generated Sample Real Sample Discriminator Model Binary Classification Real/Fake

Proposed model Architecture- Cycle GAN-AC Fake or Real L1 Utterance Generator L1 to L2 Synthesized L2 Discriminator L2 Fake or Real Generator L2 to L1 L2 Utterance Synthesized L1 Cycle Loss Cycle Loss Cycle L1 Cycle L2 Discriminator L1

Architecture for Accent Conversion Source data Target Data Feature Extraction Cycle GAN Converted File.

Data Set Passage- “ The North Wind and the Sun were disputing which was the stronger when a traveler came along wrapped in a warm cloak. They agreed that the one who first succeeded in making the traveler take his cloak off should be considered stronger than the other. Then the North Wind blew as hard as he could, but the more he blew the more closely did the traveler fold his cloak around him; and at last the North Wind gave up the attempt. Then the Sun shone out warmly, and immediately the traveler took off his cloak. And so the North Wind was obliged to confess that the Sun was the stronger of the two ” Demographic information: Birth place, native language, sex, other languages known, English learning method(Academic or naturalist), age of English onset, place where they speak English,

Recording device Device Specification: Company : RODE Microphones Model: LAVALIER GO

Future work Step 1 : Complete the proposed model. Step 2: Data collection as per the planning. Step 3: Validate the proposed model with collected data.

Timeline of the proposed work.. 2022 2023 2024 Remarks Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Development of cross lingual model using Cycle GAN. Proposed model design * Data collection * Working On the model. * = One Journal paper * = One Conference Paper Experimental studies and model validation. * Performance comparison between Proposed model and other existing models. xperimental studies and model validation. * Thesis Writing.

References [1] L. M. Arslan and J. H. Hansen, Frequency characteristics of foreign accented speech, in Proc. ICASSP. IEEE, 1997, pp. 1123 – 1126 [2] G. Min, X. Zhang, J. Yang, and X Zou, Speech reconstruction from mel -frequency cepstral coefficients via 1-norm minimization, in IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), 2015. [3] B. Milner, X. Shao, Speech reconstruction from mel -frequency cepstral coefficients using a source-filter model, School of Information Systems, University of East Anglia, Norwich, UK. [4] Dan Chazan, Ron Hoory , Gilad Cohen and Meir Zibulski , Speech reconstruction from mel -frequency cepstral coefficients and pitch frequency, IBM Research Laboratory in Haifa [5] Shariq A. Mobin and Joan Bruna, Voice Conversion using Convolutional Neural Networks, UC Berkeley [6] S. Aryal and R. Gutierrez-Osuna, "Articulatory-based conversion of foreign accents with Deep Neural Networks," in Interspeech , 2015, pp. 3385-3389 [7] D. Felps , H. Bortfeld , and R. Gutierrez-Osuna, "Foreign accent conversion in computer assisted pronunciation training," Speech Communication, vol. 51, no. 10, pp. 920-932, 2009 [8] S. Aryal , D. Felps , and R. Gutierrez-Osuna, "Foreign accent conversion through voice morphing," in Interspeech , 2013, pp. 3077-3081

[9] D. Felps and R. Gutierrez-Osuna, "Developing objective measures of foreign-accent conversion," IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 1030-1040, 2010 . [10] M. Huckvale and K. Yanagisawa, "Spoken language conversion with accent morphing," in ISCA Speech Synthesis Workshop, 2007, pp. 64-70. [11] G. Zhao, S. Sonsaat , J. Levis , E. Chukharev-Hudilainen , and R. Gutierrez-Osuna, "Accent Conversion Using Phonetic Posteriorgrams ," in ICASSP, 2018, pp. 5314-5318 [12] S. Aryal and R. Gutierrez-Osuna, "Accent conversion through cross-speaker articulatory synthesis," in ICASSP, 2014, pp. 7694-7698 [13] R. Prenger , R. Valle, and B. Catanzaro, " WaveGlow : A Flowbased Generative Network for Speech Synthesis," in ICASSP, 2019. [14] J. Jügler , F. Zimmerer , J. Trouvain , and B. Möbius, "The perceptual effect of L1 prosody transplantation on L2 Speech: The case of French accented German," in Interspeech , 2016, pp. 67-71. [15] A. v. d. Oord et al., " WaveNet : A Generative Model for Raw Audio," arXiv preprint arXiv:1609.03499, 2016. [16] G. Zhao et al., "L2-ARCTIC: A Non-Native English Speech Corpus," in Interspeech , 2018, pp. 2783-2787 [17] J. Kominek and A. W. Black, "The CMU Arctic speech databases," in ISCA Workshop on Speech Synthesis, 2004, pp. 223-224 .

[18] Amy L.Berman , Kelsey Josund , Gawan Fiore “ Accent Conversion Using Artificial Neural Networks” Corpus ID:199386464

THANK YOU
Tags