-
公开(公告)号:US10885900B2
公开(公告)日:2021-01-05
申请号:US15675249
申请日:2017-08-11
Applicant: Microsoft Technology Licensing, LLC
Inventor: Jinyu Li , Michael Lewis Seltzer , Xi Wang , Rui Zhao , Yifan Gong
IPC: G10L15/16 , G06N3/08 , G10L15/06 , G10L15/183 , G10L15/065 , G10L25/30 , G06N3/04 , G06N3/12 , G06N5/00
Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.
-
公开(公告)号:US11869482B2
公开(公告)日:2024-01-09
申请号:US17272325
申请日:2018-09-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yang Cui , Xi Wang , Lei He , Kao-Ping Soong
IPC: G10L13/047
CPC classification number: G10L13/047
Abstract: A method and apparatus for generating a speech waveform. Fundamental frequency information, glottal features and vocal tract features associated with an input may be received, wherein the glottal features include a phase feature, a shape feature, and an energy feature (1310). A glottal waveform is generated based on the fundamental frequency information and the glottal features through a first neural network model (1320). A speech waveform is generated based on the glottal waveform and the vocal tract features through a second neural network model (1330).
-
公开(公告)号:US20190051290A1
公开(公告)日:2019-02-14
申请号:US15675249
申请日:2017-08-11
Applicant: Microsoft Technology Licensing, LLC
Inventor: Jinyu Li , Michael Lewis Seltzer , Xi Wang , Rui Zhao , Yifan Gong
IPC: G10L15/16 , G06N3/08 , G10L15/06 , G10L15/183
Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.
-
-