CHAPTER 9
9.1 REFERENCES:
[1] L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, “Voice conversion using deep neural
networks with layer-wise generative training,” IEEE/ACM Trans. Audio, Speech, Lang.
Process., vol. 22, no. 12, pp. 1859–1872, Dec. 2014.
[2] Z.-H. Ling, L. Deng, and D. Yu, “Modeling spectral envelopes using restricted Boltzmann
machines and deep belief networks for statistical parametric speech synthesis,” IEEE Trans.
Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129–2139, Oct. 2013.
[3] B.-Y. Xia and C.-C. Bao, “Speech enhancement with weighted denoising Auto-Encoder,” in
Proc. Interspeech, 2013, pp. 3444–3448. [23] X.-G. Lu, Y. Tsao, S. Matsuda, and C. Hori,
“Speech enhancement based on deep denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp.
436–440.
[4] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural
networks for noise reduction in robust ASR,” in Proc. Interspeech, 2012, pp. 22–25.
[5] M. Wollmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, “Feature enhancement by
bidirectional LSTM networks for conversational speech recognition in highly non-stationary
noise,” in Proc. ICASSP, 2013, pp. 6822–6826.
[6] H. Christensen, J. Barker, N. Ma, and P. D. Green, “The CHiME corpus: A resource and a
challenge for computational hearing in multisource environments,” in Proc. Interspeech, 2010,
pp. 1918–1921.
[7] Y. X. Wang and D. L. Wang, “Towards scaling up classification-based speech separation,”
IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381–1390, Jul. 2013.