-
公开(公告)号:US12266342B2
公开(公告)日:2025-04-01
申请号:US17293640
申请日:2018-12-11
Applicant: Microsoft Technology Licensing, LLC
IPC: G10L13/08 , G06N3/045 , G10L13/047
Abstract: A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).
-
公开(公告)号:US12094447B2
公开(公告)日:2024-09-17
申请号:US17293404
申请日:2018-12-13
Applicant: Microsoft Technology Licensing, LLC
Inventor: Huaiping Ming , Lei He
IPC: G10L13/08 , G06F40/20 , G06F40/205 , G06F40/253 , G06N3/045 , G06N20/20 , G10L13/047 , G10L13/06 , G10L25/30
CPC classification number: G10L13/08 , G06F40/20 , G06F40/205 , G06F40/253 , G06N3/045 , G06N20/20 , G10L13/047 , G10L13/06 , G10L25/30
Abstract: A method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained (1310). Phoneme or character level text information may be generated based on the text input (1320). Context-sensitive text information may be generated based on the text input (1330). A text feature may be generated based on the phoneme or character level text information and the context-sensitive text information (1340). A speech waveform corresponding to the text input may be generated based at least on the text feature (1350).
-
公开(公告)号:US11922924B2
公开(公告)日:2024-03-05
申请号:US17617547
申请日:2020-05-21
Applicant: Microsoft Technology Licensing, LLC
Inventor: Jingzhou Yang , Lei He
IPC: G10L25/78 , G10L13/00 , G10L13/033 , G10L13/047 , G10L13/10
CPC classification number: G10L13/10 , G10L13/033 , G10L13/047
Abstract: Method and apparatus for generating speech through multilingual neural text-to-speech (TTS) synthesis are provided in the present disclosure. A text input in at least a first language may be received. Speaker latent space information of a target speaker may be provided through a speaker encoder. Language latent space information of a second language may be provided through a language encoder. At least one acoustic feature may be generated, through an acoustic feature predictor, based on the text input, the speaker latent space information and the language latent space information of the second language. A speech waveform corresponding to the text input may be generated, through a neural vocoder, based on the at least one acoustic feature.
-
4.
公开(公告)号:US11600261B2
公开(公告)日:2023-03-07
申请号:US17827275
申请日:2022-05-27
Applicant: Microsoft Technology Licensing, LLC
Inventor: Shifeng Pan , Lei He , Yulin Li , Sheng Zhao , Chunling Ma
IPC: G10L13/10 , G10L15/06 , G10L21/013 , G10L15/18 , G10L15/187 , G10L25/18 , G10L25/30 , G10L25/63
Abstract: Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.
-
公开(公告)号:US20210304769A1
公开(公告)日:2021-09-30
申请号:US15931788
申请日:2020-05-14
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Guoli Ye , Yan Huang , Wenning Wei , Lei He , Eva Sharma , Jian Wu , Yao Tian , Edward C. Lin , Yifan Gong , Rui Zhao , Jinyu Li , William Maxwell Gale
Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.
-
公开(公告)号:US10652119B2
公开(公告)日:2020-05-12
申请号:US15636929
申请日:2017-06-29
Applicant: Microsoft Technology Licensing, LLC
Inventor: Lei He , Wilson Man-Hong Li , Tiancong Zhou , Swati Singh
Abstract: Various embodiments of the present technology generally relate to systems and methods for self-healing services and automatic recovery of distribute systems. Some embodiments of the present technology leverage all the available synthetic, customer, client, server, support signals from various sources to intelligently and in real-time detect outages, root cause outages to recoverable targets (e.g., for auto recovery actions), identify the right engineering teams (e.g., for faster manual mitigation), and perform the appropriate recovery action (such as recycle service, reboot server, switch out a faulty rack) or other mitigation actions such as routing, collecting debug information, alerting to the right team, or alert suppression. Some embodiments separate signal monitoring and workflow coordination.
-
公开(公告)号:US10073726B2
公开(公告)日:2018-09-11
申请号:US14475543
申请日:2014-09-02
Applicant: Microsoft Technology Licensing, LLC
Inventor: Olga Ivanova , Venkat Narayanan , Smita Ojha , Lei He , Art Sadovsky , Yi Wang , Ashish Premaraj
CPC classification number: G06F11/0772 , G06F11/0709 , G06F11/076 , G06F21/6254 , H04L41/5012 , H04L41/5032 , H04L41/507 , H04L43/04 , H04L43/16 , H04L67/22
Abstract: Outage detection in a cloud based service is provided using usage data based error signals. Usage data is collected from component of the cloud based service or client devices of the cloud based service based on customer actions on the cloud based service. The usage data is aggregated and normalized to generate an error signal from errors generated from a component of the cloud based service. An outage is detected from the error signal. An alert that includes information associated with the outage and one or more customers impacted by the outage is generated.
-
公开(公告)号:US11869482B2
公开(公告)日:2024-01-09
申请号:US17272325
申请日:2018-09-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yang Cui , Xi Wang , Lei He , Kao-Ping Soong
IPC: G10L13/047
CPC classification number: G10L13/047
Abstract: A method and apparatus for generating a speech waveform. Fundamental frequency information, glottal features and vocal tract features associated with an input may be received, wherein the glottal features include a phase feature, a shape feature, and an energy feature (1310). A glottal waveform is generated based on the fundamental frequency information and the glottal features through a first neural network model (1320). A speech waveform is generated based on the glottal waveform and the vocal tract features through a second neural network model (1330).
-
公开(公告)号:US11587569B2
公开(公告)日:2023-02-21
申请号:US15931788
申请日:2020-05-14
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Guoli Ye , Yan Huang , Wenning Wei , Lei He , Eva Sharma , Jian Wu , Yao Tian , Edward C. Lin , Yifan Gong , Rui Zhao , Jinyu Li , William Maxwell Gale
Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.
-
公开(公告)号:US20220277728A1
公开(公告)日:2022-09-01
申请号:US17631695
申请日:2020-06-17
Applicant: Microsoft Technology Licensing, LLC
Inventor: Shaofei Zhang , Lei He
IPC: G10L13/08 , G10L13/047 , G10L13/06 , G10L25/30
Abstract: The present disclosure provides a method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained. A phone feature of the text input may be generated. Context features of the text input may be generated based on a set of sentences associated with the text input. A speech waveform corresponding to the text input may be generated based on the phone feature and the context features.
-
-
-
-
-
-
-
-
-