Reclamation-based-Voice(talk)-Conversion

gokulr20005 10 views 8 slides Sep 12, 2024
Slide 1
Slide 1 of 8
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8

About This Presentation

Voice Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012

E.Godoy Biography • United States (Rhode Island): Native • Hometown: Middletown, RI • Undergrad & Masters at MIT (Boston Area) • France (Lannion): 2007-2011 • Worked on my PhD at Orange Labs • Ira...


Slide Content

Reclamation-based-Voice-Conversion( RVC) GURUMURTHY.V(421122102050) DHIVYAKUMARAN.K(421122102037) GOKULRAJ.R(421122102044) PRESENTED BY Under The Guidance of MRS.K.KALAISELVI/AP/CSE

PROBLEM STATEMENT Naturalness: Maintaining the naturalness of the source speaker's voice while converting it to the target speaker is challenging. Artifacts and distortions can  occur. Data Quality: Noise, background interference, and inconsistent recording conditions can negatively impact the performance of the model. Accurate Retrieval: The quality of the converted speech depends on the accuracy of the retrieved target speaker utterances. Ineffective retrieval can result in mismatched voice characteristics.

PROBLEM STATEMENT DIAGRAM The quality of the data significantly impacts the model's performance. Noise, background interference, and inconsistent recording conditions can degrade . The development of timbre conversion and imitation is not perfect.

Voice conversion models require a vast amount of data to learn intricate voice characteristics. Collecting and processing such a large dataset can be time-consuming and expensive.

PROPOSED SYSTEM A powerful speech encoder is trained on a large dataset to extract high-level acoustic features that capture speaker identity and linguistic content. Given a target speaker's voice, a set of representative speech segments is extracted and encoded into the same feature space. For a given input speech, the encoder generates its corresponding feature representation.

EXISTING SYSTEM S. NO TITLE AUTHOR YEAR PUBLISHED DRAWBACK 1. Voice Conversion from a Single Speaker Example Jun Wu, Yi Li, and Yiheng Liu 2015 Requires a large amount of training data for each speaker. 2. A Waveform Generation Model for High-Quality Speech Synthesis Tomoki Kaneko, Kazuhiro Takahashi, and Shinji Nakamura 2018 Can produce artifacts in the synthesized speech, especially for unseen speakers. 3. A Generative Adversarial Network Approach to Text-to- Speech Synthesis Zheng Wang, Jonathan Sotelo, Haoyu Wu, Gregor Kurz 2017 The generated speech quality can be sensitive to the quality of the text-to- speech (TTS) model used.

S. NO TITLE AUTHOR YEAR PUBLISHED DRAWBACK 4. A Generative Model for Raw Audio Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu 2016 Requires a significant amount of computational resources for training and inference. 5. A Flow-based Generative Model for Text-to-Speech Synthesis Jongwook Kim, Hyunjun Kim, and Ho-Sang Lee 2020 Can struggle with preserving the naturalness and expressiveness of the original voice.

THANK YOU
Tags