Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Lei He"

1.

发明授权
Multi-speaker neural text-to-speech synthesis 有权

公开(公告)号：US12266342B2

公开(公告)日：2025-04-01

申请号：US17293640

申请日：2018-12-11

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yan Deng , Lei He

IPC: G10L13/08 , G06N3/045 , G10L13/047

Abstract: A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).

2.

发明授权
Neural text-to-speech synthesis with multi-level text information 有权

公开(公告)号：US12094447B2

公开(公告)日：2024-09-17

申请号：US17293404

申请日：2018-12-13

Applicant: Microsoft Technology Licensing, LLC

Inventor： Huaiping Ming , Lei He

IPC: G10L13/08 , G06F40/20 , G06F40/205 , G06F40/253 , G06N3/045 , G06N20/20 , G10L13/047 , G10L13/06 , G10L25/30

CPC classification number: G10L13/08 , G06F40/20 , G06F40/205 , G06F40/253 , G06N3/045 , G06N20/20 , G10L13/047 , G10L13/06 , G10L25/30

Abstract: A method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained (1310). Phoneme or character level text information may be generated based on the text input (1320). Context-sensitive text information may be generated based on the text input (1330). A text feature may be generated based on the phoneme or character level text information and the context-sensitive text information (1340). A speech waveform corresponding to the text input may be generated based at least on the text feature (1350).

3.

发明授权
Multilingual neural text-to-speech synthesis 有权

公开(公告)号：US11922924B2

公开(公告)日：2024-03-05

申请号：US17617547

申请日：2020-05-21

Applicant: Microsoft Technology Licensing, LLC

Inventor： Jingzhou Yang , Lei He

IPC: G10L25/78 , G10L13/00 , G10L13/033 , G10L13/047 , G10L13/10

CPC classification number: G10L13/10 , G10L13/033 , G10L13/047

Abstract: Method and apparatus for generating speech through multilingual neural text-to-speech (TTS) synthesis are provided in the present disclosure. A text input in at least a first language may be received. Speaker latent space information of a target speaker may be provided through a speaker encoder. Language latent space information of a second language may be provided through a language encoder. At least one acoustic feature may be generated, through an acoustic feature predictor, based on the text input, the speaker latent space information and the language latent space information of the second language. A speech waveform corresponding to the text input may be generated, through a neural vocoder, based on the at least one acoustic feature.

4.

发明授权
System and method for cross-speaker style transfer in text-to-speech and training data generation 有权

公开(公告)号：US11600261B2

公开(公告)日：2023-03-07

申请号：US17827275

申请日：2022-05-27

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shifeng Pan , Lei He , Yulin Li , Sheng Zhao , Chunling Ma

IPC: G10L13/10 , G10L15/06 , G10L21/013 , G10L15/18 , G10L15/187 , G10L25/18 , G10L25/30 , G10L25/63

Abstract: Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.

5.

发明申请
GENERATING AND USING TEXT-TO-SPEECH DATA FOR SPEECH RECOGNITION MODELS 有权

公开(公告)号：US20210304769A1

公开(公告)日：2021-09-30

申请号：US15931788

申请日：2020-05-14

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Guoli Ye , Yan Huang , Wenning Wei , Lei He , Eva Sharma , Jian Wu , Yao Tian , Edward C. Lin , Yifan Gong , Rui Zhao , Jinyu Li , William Maxwell Gale

IPC: G10L15/26 , G10L15/16 , G10L15/06 , G10L13/08

Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.

6.

发明授权
Automatic recovery engine with continuous recovery state machine and remote workflows 有权

公开(公告)号：US10652119B2

公开(公告)日：2020-05-12

申请号：US15636929

申请日：2017-06-29

Applicant: Microsoft Technology Licensing, LLC

Inventor： Lei He , Wilson Man-Hong Li , Tiancong Zhou , Swati Singh

IPC: H04L12/26 , H04L12/24

Abstract: Various embodiments of the present technology generally relate to systems and methods for self-healing services and automatic recovery of distribute systems. Some embodiments of the present technology leverage all the available synthetic, customer, client, server, support signals from various sources to intelligently and in real-time detect outages, root cause outages to recoverable targets (e.g., for auto recovery actions), identify the right engineering teams (e.g., for faster manual mitigation), and perform the appropriate recovery action (such as recycle service, reboot server, switch out a faulty rack) or other mitigation actions such as routing, collecting debug information, alerting to the right team, or alert suppression. Some embodiments separate signal monitoring and workflow coordination.

7.

发明授权
Detection of outage in cloud based service using usage data based error signals 有权

公开(公告)号：US10073726B2

公开(公告)日：2018-09-11

申请号：US14475543

申请日：2014-09-02

Applicant: Microsoft Technology Licensing, LLC

Inventor： Olga Ivanova , Venkat Narayanan , Smita Ojha , Lei He , Art Sadovsky , Yi Wang , Ashish Premaraj

IPC: G06F11/07 , G06F21/62 , H04L12/24 , H04L12/26 , H04L29/08

CPC classification number: G06F11/0772 , G06F11/0709 , G06F11/076 , G06F21/6254 , H04L41/5012 , H04L41/5032 , H04L41/507 , H04L43/04 , H04L43/16 , H04L67/22

Abstract: Outage detection in a cloud based service is provided using usage data based error signals. Usage data is collected from component of the cloud based service or client devices of the cloud based service based on customer actions on the cloud based service. The usage data is aggregated and normalized to generate an error signal from errors generated from a component of the cloud based service. An outage is detected from the error signal. An alert that includes information associated with the outage and one or more customers impacted by the outage is generated.

8.

发明授权
Speech waveform generation 有权

公开(公告)号：US11869482B2

公开(公告)日：2024-01-09

申请号：US17272325

申请日：2018-09-30

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yang Cui , Xi Wang , Lei He , Kao-Ping Soong

IPC: G10L13/047

CPC classification number: G10L13/047

Abstract: A method and apparatus for generating a speech waveform. Fundamental frequency information, glottal features and vocal tract features associated with an input may be received, wherein the glottal features include a phase feature, a shape feature, and an energy feature (1310). A glottal waveform is generated based on the fundamental frequency information and the glottal features through a first neural network model (1320). A speech waveform is generated based on the glottal waveform and the vocal tract features through a second neural network model (1330).

9.

发明授权
Generating and using text-to-speech data for speech recognition models 有权

公开(公告)号：US11587569B2

公开(公告)日：2023-02-21

申请号：US15931788

申请日：2020-05-14

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Guoli Ye , Yan Huang , Wenning Wei , Lei He , Eva Sharma , Jian Wu , Yao Tian , Edward C. Lin , Yifan Gong , Rui Zhao , Jinyu Li , William Maxwell Gale

IPC: G10L15/26 , G10L13/08 , G10L15/06 , G10L15/16

Abstract: Systems, methods, and devices are provided for generating and using text-to-speech (TTS) data for improved speech recognition models. A main model is trained with keyword independent baseline training data. In some instances, acoustic and language model sub-components of the main model are modified with new TTS training data. In some instances, the new TTS training is obtained from a multi-speaker neural TTS system for a keyword that is underrepresented in the baseline training data. In some instances, the new TTS training data is used for pronunciation learning and normalization of keyword dependent confidence scores in keyword spotting (KWS) applications. In some instances, the new TTS training data is used for rapid speaker adaptation in speech recognition models.

10.

发明申请
Paragraph synthesis with cross utterance features for neural TTS 有权

公开(公告)号：US20220277728A1

公开(公告)日：2022-09-01

申请号：US17631695

申请日：2020-06-17

Applicant: Microsoft Technology Licensing, LLC

Inventor： Shaofei Zhang , Lei He

IPC: G10L13/08 , G10L13/047 , G10L13/06 , G10L25/30

Abstract: The present disclosure provides a method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained. A phone feature of the text input may be generated. Context features of the text input may be generated based on a set of sentences associated with the text input. A speech waveform corresponding to the text input may be generated based on the phone feature and the context features.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification