Voice_Intonation Transformation using Linear Mapping

SakshiPandey29 4 views 20 slides Aug 03, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Voice Intonation


Slide Content

Voice Intonation Transformation Using Segmental Linear Mapping of Pitch Contour Amit Banerjee, Sakshi Pandey , K.M. Khushboo

1. Introduction Sound signal is a continuous air pressure variation in time . A sound can be considered as an one-dimensional continuous signal with time as a free variable . Speech is like any other sound is a continuous air pressure variation. The prosodic information in the speech signal helps in the human speech perception. It creates an impression in the mind of the listener about learned characteristics like the dialect, tone and pitch.

1. Introduction Voice transformation converts the voice of a speaker to an intended person, such that the listener is deceived for the target speaker. Consider y(t) and x(t) as two different sound signal than a mapping function f can be defined as: y(t ) = f(x(t)). The idea is to transform the sound signal to the desired target sound using mapping function f.

2. Problem Definition The paper, uses functional mapping for the pitch transformation in voice, to map the pitch contour of source voice to a target voice using segmental linear mapping function. E xtracts the invariant features the source and target vocal signals to perform the mapping for voice transformation. The invariant features correspond to the linguist parameter set, used to performs the segmentation of the pitch contour and finally perform the voice intonation transformation.

3. Methodology Figure 1: Pitch Transformation in Human Voice

3. Methodology The process of mapping pitch contour for voice transformation is divided in four steps : ( 1) Pitch Contour Extraction and Smoothing ( 2) Extraction of the Linguistic Parameter Set ( 3) Segmental Linear Mapping ( 4) Re-synthesis of the Audio Signal.

3.1 Pitch Contour Extraction and Smoothing The audio signals are used to extract pitch using the “Yet Another Pitch tracking algorithm” (YAAPT ). Smoothing is performed on extracted pitch contour.

3.1 Pitch Contour Extraction and Smoothing Figure 2. (a) Pitch Contour (b) Smoothed Pitch Contour

3.2 Extraction of the Linguistic Parameter In the voiced segment, we extract the linguistically motivated parameter set to capture the intonation of a speaker . Sentence-Initial High(S) Non-Initial Accent Peak(H) Post-Accent Valleys(L) Sentence-Final Low(F )

3.2 Extraction of the Linguistic Parameter Figure 3. Linguistically motivated parameter set

3.3 Segmental Linear Mapping

3.4 Re-synthesis of the Audio Signal. The pitch-marks are generated on the transformed pitch contour . Finally, the modified pitch contour is re-synthesized using pitch synchronous overlapping add (PSOLA) to generate the transformed speech signal.

4. Results Figure 4. Pitch in target speech and transformed speech

4. Results TABLE I. Linguistic Parameter Set

4. Results Figure 5. Pitch contour of target and transformed speech signal

5. Conclusions This paper, investigates voice intonation transformation by segmental mapping of the pitch contour by extraction of the linguistic parameter set from the source and target voices. The obtained results of the modified pitch contour shows The approach discussed in the paper captures the intonation of the target speaker but voice quality is same as the source speaker. The results are better in cases when the target and the source signals are similar. However, further investigation needs to be done to capture the voice quality of the target speaker

References J. W. Shin, J.-H. Chang, and N. S. Kim, “Voice activity detection based on statistical models and machine learning approaches,” Computer Speec & Language, vol. 24, no. 3, pp. 515–530, 2010. Y. Stylianou , “Voice transformation: a survey,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, pp. 3585–3588, IEEE, 2009. E. E. Helander and J. Nurminen , “A novel method for prosody prediction in voice conversion,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 4, pp. IV–509, IEEE, 2007 . L. M. Arslan and D. Talkin , “Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum,” in In Proc. EUROSPEECH, pp. 1347–1350, 1997.

References B . Gillett and S. King, “Transforming f0 contours,” 2003. J . P. Campbell, “Speaker recognition: A tutorial,” Proceedings of the IEEE, vol. 85, no. 9, pp. 1437–1462, 1997. Q. Liu, M. Yao, H. Xu , and F. Wang, “Research on different feature parameters in speaker recognition,” Journal of Signal and Information Processing, vol. 4, no. 02, p. 106, 2013. A. G. Adami , R. Mihaescu , D. A. Reynolds, and J. J. Godfrey, “Modeling prosodic dynamics for speaker recognition,” in Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). 2003 IEEE International Conference on, vol. 4, pp. IV–788, IEEE, 2003.

References E. Shriberg , D. R. Ladd, J. Terken , and A. Stolcke , “Modeling pitch range variation within and across speakers predicting f 0 targets when speaking up,” in Proceedings of the 4th international conference on spoken language processing, pp. 1–4, 1996. S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental frequency tracking,” The Journal of the Acoustical Society of America, vol. 123, no. 6, pp. 4559–4571, 2008. X. Zhao, D. O’Shaughnessy, and N. Minh- Quang , “A processing method for pitch smoothing based on autocorrelation and cepstral f0 detection approaches,” in Signals, Systems and Electronics, 2007. ISSSE’07. International Symposium on, pp. 59–62, IEEE, 2007. D. J. Patterson, Linguistic approach to pitch range modelling . PhD thesis, Edinburgh University, 2000. A. Mousa , “Voice conversion using pitch shifting algorithm by time stretching with psola and re-sampling,” Journal of electrical engineering, vol. 61, no. 1, pp. 57–61, 2010.

THANK YOU
Tags