Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Jian LUAN"

1.

发明公开
PHRASE-BASED END-TO-END TEXT-TO-SPEECH (TTS) SYNTHESIS 审中-公开

公开(公告)号：US20230169953A1

公开(公告)日：2023-06-01

申请号：US17919982

申请日：2021-03-19

Applicant: Microsoft Technology Licensing, LLC

Inventor： Ran Zhang , Jian LUAN , Yahuan Cong

IPC: G10L13/08 , G10L13/04

CPC classification number: G10L13/08 , G10L13/04

Abstract: The present disclosure provides methods and apparatuses for phrase-based end-to-end text-to-speech (TTS) synthesis.
A text may be obtained. A target phrase in the text may be identified. A phrase context of the target phrase may be determined. An acoustic feature corresponding to the target phrase may be generated based at least on the target phrase and the phrase context. A speech waveform corresponding to the target phrase may be generated based on the acoustic feature.

2.

发明申请
INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING 有权

公开(公告)号：US20220122580A1

公开(公告)日：2022-04-21

申请号：US17561895

申请日：2021-12-24

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Pei ZHAO , Kaisheng YAO , Max LEUNG , Bo YAN , Jian LUAN , Yu SHI , Malone MA , Mei-Yuh HWANG

IPC: G10L13/027 , G06F3/16 , G10L13/08 , G10L15/06 , G10L15/18 , G10L15/22 , G10L15/26 , G10L25/63

Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.

3.

发明申请
PROVIDING EMOTION MANAGEMENT ASSISTANCE 有权

公开(公告)号：US20220059122A1

公开(公告)日：2022-02-24

申请号：US17432476

申请日：2020-02-03

Applicant: Microsoft Technology Licensing, LLC

Inventor： Chi Xiu , Jian LUAN

IPC: G10L25/63 , G10L17/02 , G10L17/04 , G10L17/22

Abstract: A method for providing emotion management assistance is provided. Sound streams may be received. A speech conversation between a user and at least one conversation object may be detected from the sound streams. Identity of the conversation object may be identified at least according to speech of the conversation object in the speech conversation. Emotion state of at least one speech segment of the user in the speech conversation may be determined. An emotion record corresponding to the speech conversation may be generated, wherein the emotion record at least including the identity of the conversation object, at least a portion of content of the speech conversation, and the emotion state of the at least one speech segment of the user.

4.

发明申请
AUTOMATIC DUBBING METHOD AND APPARATUS 审中-公开

公开(公告)号：US20200058289A1

公开(公告)日：2020-02-20

申请号：US16342416

申请日：2016-11-21

Applicant: Microsoft Technology Licensing, LLC

Inventor： Henry GABRYJELSKI , Jian LUAN , Dapeng Li

IPC: G10L13/04 , G10L17/00 , G06F17/28 , G10L13/08

Abstract: An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content (504); obtaining a voice print model for the extracted speeches of the voice (506); processing the extracted speeches by utilizing the voice print model to generate replacement speeches (508); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (510).

5.

发明申请
AUTOMATIC SONG GENERATION 审中-公开

公开(公告)号：US20200035209A1

公开(公告)日：2020-01-30

申请号：US16500995

申请日：2018-04-18

Applicant: MICROSOFT TECHNOLOGY LICENSING LLC

Inventor： Jian LUAN , Qinying LIAO , Zhen LIU , Nan YANG , Furu WEI

IPC: G10H1/00 , G06N20/00 , G10H1/36

Abstract: In accordance with implementations of the subject matter described herein, there is provided a solution for supporting a machine to automatically generate a song. In this solution, an input from a user is used to determine a creation intention of the user with respect to a song to be generated. Lyrics of the song are generated based on the creation intention. Then, a template for the song is generated based at least in part on the lyrics. The template indicates a melody matching with the lyrics. In this way, it is feasible to automatically create the melody and lyrics which not only conform to the creation intention of the user but also match with each other.

6.

发明公开
SPONTANEOUS TEXT TO SPEECH (TTS) SYNTHESIS 审中-公开

公开(公告)号：US20230206899A1

公开(公告)日：2023-06-29

申请号：US17926994

申请日：2021-04-22

Applicant: Microsoft Technology Licensing, LLC

Inventor： Ran Zhang , Jian LUAN , Yahuan Cong

IPC: G10L13/10 , G10L13/04

CPC classification number: G10L13/10 , G10L13/04 , G10L2013/105

Abstract: The present disclosure provides methods and apparatuses for spontaneous text-to-speech (TTS) synthesis. A target text may be obtained. A fluency reference factor may be determined based at least on the target text. An acoustic feature corresponding to the target text may be generated with the fluency reference factor. A speech waveform corresponding to the target text may be generated based on the acoustic feature.

7.

发明申请
AUTOMATIC DUBBING METHOD AND APPARATUS 有权

公开(公告)号：US20230076258A1

公开(公告)日：2023-03-09

申请号：US17985016

申请日：2022-11-10

Applicant: Microsoft Technology Licensing, LLC

Inventor： Henry GABRYJELSKI , Jian LUAN , Dapeng LI

IPC: G10L13/00 , G06F40/58 , G10L13/08 , G10L17/00

Abstract: A method and system for automatic dubbing method is disclosed, comprising, responsive to receiving a selection of media content for playback on a user device by a user of the user device, processing extracted speeches of a first voice from the media content to generate replacement speeches using a set of phenomes of a second voice of the user of the user device, and replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content for playback on the user device.

8.

发明申请
INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING 有权

公开(公告)号：US20210225357A1

公开(公告)日：2021-07-22

申请号：US16309399

申请日：2017-06-07

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Pei ZHAO , Kaisheng YAO , Max LEUNG , Bo YAN , Jian LUAN , Yu SHI , Malone MA , Mei-Yuh HWANG

IPC: G10L13/027 , G10L13/08 , G10L15/26 , G10L15/18 , G06F3/16 , G10L15/22 , G10L25/63 , G10L15/06

Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.

9.

发明申请
A HIGHLY EMPATHETIC ITS PROCESSING 有权

公开(公告)号：US20210082396A1

公开(公告)日：2021-03-18

申请号：US17050153

申请日：2019-05-13

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： Jian LUAN , Shihui LIU

IPC: G10L13/10 , G10L13/047

Abstract: The present disclosure provides a technical solution of highly empathetic TTS processing, which not only takes a semantic feature and a linguistic feature into consideration, but also assigns a sentence ID to each sentence in a training text to distinguish sentences in the training text. Such sentence IDs may be introduced as training features into a processing of training a machine learning model, so as to enable the machine learning model to learn a changing rule for the changing of acoustic codes of sentences with a context of sentence. A speech naturally changed in rhythm and tone may be output to make TTS more empathetic by performing TTS processing with the trained model. A highly empathetic audio book may be generated using the TTS processing provided herein, and an online system for generating a highly empathetic audio book may be established with the TTS processing as a core technology.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification