Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Shujie Liu"

1.

发明授权
Unified speech representation learning 有权

公开(公告)号：US12217745B2

公开(公告)日：2025-02-04

申请号：US18217888

申请日：2023-07-03

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yao Qian , Yu Wu , Kenichi Kumatani , Shujie Liu , Furu Wei , Nanshan Zeng , Xuedong David Huang , Chengyi Wang

IPC: G10L15/187 , G06N20/00 , G10L15/02 , G10L15/06 , G10L15/22

Abstract: A system obtains a first training data set comprising labeled speech data or both labeled and unlabeled data corresponding to a high-resource data set as well as latent speech representations based on the first training data set. The system trains a machine learning model on the first training data set to learn phonetically aware speech representations corresponding to the first training data set. The system applies the latent speech representations to a transformer context network to generate contextual representations. The system aligns each of the contextual representations with a phoneme label to generate phonetically-aware contextual representations. The system causes a refinement engine to further refine the machine learning model.

2.

发明授权
Efficiency adjustable speech recognition system 有权

公开(公告)号：US12020694B2

公开(公告)日：2024-06-25

申请号：US18331742

申请日：2023-06-08

Applicant: Microsoft Technology Licensing, LLC

Inventor： Yu Wu , Jinyu Li , Shujie Liu , Xie Chen , Chengyi Wang

IPC: G10L15/16 , G06N3/044 , G06N3/08 , G10L15/06 , G10L15/22

CPC classification number: G10L15/16 , G06N3/044 , G06N3/08 , G10L15/063 , G10L15/22

Abstract: The computing system trains an end-to-end (E2E) automatic speech recognition (ASR) model, using a transformer-transducer-based deep neural network that comprises a transformer encoder network and a transducer predictor network. The E2E ASR model is trained to have one or more adjustable hyperparameters that are configured to dynamically adjust an efficiency or a performance of the E2E ASR model when the E2E ASR model is deployed onto a device or executed by the device, by identifying one or more conditions of the device associated with computational power of the device and setting at least one of the one or more adjustable hyperparameters based on one or more conditions of the device.

3.

发明授权
Unified speech representation learning 有权

公开(公告)号：US11735171B2

公开(公告)日：2023-08-22

申请号：US17320496

申请日：2021-05-14

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Yao Qian , Yu Wu , Kenichi Kumatani , Shujie Liu , Furu Wei , Nanshan Zeng , Xuedong David Huang , Chengyi Wang

IPC: G10L15/187 , G06N20/00 , G10L15/06 , G10L15/22 , G10L15/02

CPC classification number: G10L15/187 , G06N20/00 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025

Abstract: Systems and methods are provided for training a machine learning model to learn speech representations. Labeled speech data or both labeled and unlabeled data sets is applied to a feature extractor of a machine learning model to generate latent speech representations. The latent speech representations are applied to a quantizer to generate quantized latent speech representations and to a transformer context network to generate contextual representations. Each contextual representation included in the contextual representations is aligned with a phoneme label to generate phonetically-aware contextual representations. Quantized latent representations are aligned with phoneme labels to generate phonetically aware latent speech representations. Systems and methods also include randomly replacing a sub-set of the contextual representations with quantized latent speech representations during their alignments to phoneme labels and aligning the phonetically aware latent speech representations to the contextual representations using supervised learning.

4.

发明授权
Assertion-based question answering 有权

公开(公告)号：US11327971B2

公开(公告)日：2022-05-10

申请号：US16766088

申请日：2018-12-06

Applicant: Microsoft Technology Licensing, LLC

Inventor： Duyu Tang , Nan Duan , Ming Zhou , Wendi Wang , Daxin Jiang , Shujie Liu , Linjun Shou , Ming Gong

IPC: G06F16/2455 , G06F16/248

Abstract: In embodiments of the present disclosure, there is provided an assertion-based question answering manner. After a question and the related passage are obtained, an assertion answer to the question is determined based on content of the passage, and the assertion answer has a predetermined structure and represents a complete semantic meaning. Then, the assertion answer to the question may be outputted to the user. In the embodiments of the present disclosure, the question and the relevant passage are used as input, and a semi-structured assertion answer is output. The assertion answer according to embodiments of the present disclosure can provide richer semantic content than the traditional short answer, and provide a more concise expression than the traditional long answer, thereby ensuring accuracy of the answer while improving the user experience.

5.

发明授权
Canonical training for highly configurable multilingual speech 有权

公开(公告)号：US12249336B2

公开(公告)日：2025-03-11

申请号：US18573846

申请日：2021-06-29

Applicant: Microsoft Technology Licensing, LLC , Jinyu Li , Long Zhou , Xie Sun , Shujie Liu

Inventor： Jinyu Li , Long Zhou , Xie Sun , Shujie Liu

IPC: G10L15/32 , G10L15/00 , G10L15/06 , G10L15/30

Abstract: Embodiments are provided for building a configurable multilingual model. A computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. The computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.

6.

发明授权
Efficiency adjustable speech recognition system 有权

公开(公告)号：US11715462B2

公开(公告)日：2023-08-01

申请号：US17244891

申请日：2021-04-29

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Yu Wu , Jinyu Li , Shujie Liu , Xie Chen , Chengyi Wang

IPC: G10L15/16 , G06N3/08 , G10L15/06 , G10L15/22 , G06N3/044

CPC classification number: G10L15/16 , G06N3/044 , G06N3/08 , G10L15/063 , G10L15/22

Abstract: A computing system is configured to generate a transformer-transducer-based deep neural network. The transformer-transducer-based deep neural network comprises a transformer encoder network and a transducer predictor network. The transformer encoder network has a plurality of layers, each of which includes a multi-head attention network sublayer and a feed-forward network sublayer. The computing system trains an end-to-end (E2E) automatic speech recognition (ASR) model, using the transformer-transducer-based deep neural network. The E2E ASR model has one or more adjustable hyperparameters that are configured to dynamically adjust an efficiency or a performance of E2E ASR model when the E2E ASR model is deployed onto a device or executed by the device.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification