-
公开(公告)号:US10957337B2
公开(公告)日:2021-03-23
申请号:US15991988
申请日:2018-05-29
发明人: Zhuo Chen , Hakan Erdogan , Takuya Yoshioka , Fileno A. Alleva , Xiong Xiao
IPC分类号: G10L21/00 , G10L21/0272 , G06N3/08 , G10L17/04 , G10L17/18 , G10L19/022 , G10L21/0208 , H04R3/00
摘要: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.
-
公开(公告)号:US10127901B2
公开(公告)日:2018-11-13
申请号:US14303969
申请日:2014-06-13
发明人: Pei Zhao , Max Leung , Kaisheng Yao , Bo Yan , Sheng Zhao , Fileno A. Alleva
摘要: The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech.
-