专利检索 ipc:G10L13/033 第 1 页

1.

发明授权
System and method for data augmentation of feature-based voice data 有权

公开(公告)号：US12073818B2

公开(公告)日：2024-08-27

申请号：US17197740

申请日：2021-03-10

申请人： Microsoft Technology Licensing, LLC

发明人： Dushyant Sharma , Patrick A. Naylor , James W. Fosburgh , Do Yeong Kim

IPC分类号： G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00

CPC分类号： G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303

摘要： A method, computer program product, and computing system for receiving feature-based voice data. One or more data augmentation characteristics may be received. One or more augmentations of the feature-based voice data may be generated, via a machine learning model, based upon, at least in part, the feature-based voice data and the one or more data augmentation characteristics.

2.

发明公开
COMPUTER GRAPHICS PROCESSING AND SELECTIVE VISUAL DISPLAY SYSTEM 审中-公开

公开(公告)号：US20240273793A1

公开(公告)日：2024-08-15

申请号：US18441461

申请日：2024-02-14

申请人： Richard Christopher DeCharms

发明人： Richard Christopher DeCharms

IPC分类号： G06T11/60 , G06F3/0484 , G06F3/0488 , G06F40/109 , G06F40/166 , G06F40/47 , G09B5/06 , G10L13/033

CPC分类号： G06T11/60 , G06F3/0484 , G06F3/0488 , G06F40/109 , G06F40/166 , G06F40/47 , G09B5/06 , G10L13/0335 , G06T2200/24

摘要： According to an aspect of the present invention, there is provided a computer graphics processing and selective visual display system, comprising: a computer graphics processing and selective visual display system with a screen; an eye tracking device; a processor; one or more computer memory devices; wherein the processor is arranged for operations comprising: measuring the user's eye movements to ascertain the specific word on which the user is fixated, by the eye tracking device; modifying the display at the user's current fixation point; applying a delay between the presentation of successive graphic elements based on the user's calculated rate to accommodate the user's required time; and presenting elements to the user at a rate based upon the user's required time.

3.

发明公开
LATENT SPACE EDITING AND NEURAL ANIMATION TO GENERATE HYPERREAL SYNTHETIC FACES 审中-公开

公开(公告)号：US20240212249A1

公开(公告)日：2024-06-27

申请号：US18089487

申请日：2022-12-27

申请人： Metaphysic.AI

发明人： Chris Ume , Jo Plaete , Martin Adams , Thomas Graham

IPC分类号： G06T13/40 , G06N20/00 , G06T13/20 , G06T19/00 , G10L13/033

CPC分类号： G06T13/40 , G06N20/00 , G06T13/205 , G06T19/006 , G10L13/033

摘要： Using latent space manipulation and neural animation to generate hyperreal synthetic faces is described. A machine learning model(s) may be trained to generate a synthetic face of a subject featured in unaltered video content based at least in part on video data of an actor making a mouth-generated sound or a three-dimensional (3D) model of a face of the subject that has been animated in accordance with the mouth-generated sound. Latent space manipulation and neural animation may be used with the trained machine learning model(s) to generate instances of the synthetic face, and the instances of the synthetic face can be used to create altered video content featuring the subject with the synthetic face making the mouth-generated sound.

4.

发明授权
System and method for data augmentation of feature-based voice data 有权

公开(公告)号：US12014722B2

公开(公告)日：2024-06-18

申请号：US17197587

申请日：2021-03-10

申请人： Microsoft Technology Licensing, LLC

发明人： Dushyant Sharma , Patrick A. Naylor , James W. Fosburgh

IPC分类号： G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00

CPC分类号： G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303

摘要： A method, computer program product, and computing system for receiving feature-based voice data associated with a first acoustic domain. One or more gain-based augmentations may be performed on at least a portion of the feature-based voice data, thus defining gain-augmented feature-based voice data.

5.

发明授权
Multi-source based knowledge data for artificial intelligence characters 有权

公开(公告)号：US12002470B1

公开(公告)日：2024-06-04

申请号：US18401544

申请日：2023-12-31

申请人： Theai, Inc.

发明人： Ilya Gelfenbeyn , Mikhail Ermolenko , Kylan Gibbs , Kirill Ryzhov , Nathan Yu

IPC分类号： G10L15/00 , G06F16/332 , G10L13/033 , G10L15/183 , G10L15/22 , G10L15/30 , G06F40/30 , G10L15/18

CPC分类号： G10L15/22 , G06F16/3329 , G10L13/033 , G10L15/183 , G10L15/30 , G06F40/30 , G10L15/1822

摘要： Systems and methods for providing multi-source based knowledge data for Artificial Intelligence (AI) characters are provided. An example method includes providing a plurality of data sources; receiving, from a user, at least one word during a conversation between the user and an AI character; ascertaining a speech style of the AI character; analyzing the at least one word to determine a type of information needed to generate a reply to the user; selecting, based on the type of information, at least one data source from the plurality of data sources; generating, based on the at least one word, one or more queries; sending the one or more queries to the at least one data source; receiving one or more responses from the at least one data source; forming, based on the one or more responses and the speech style of the AI character, the reply for providing to the user.

6.

发明授权
Translating a media asset with vocal characteristics of a speaker 有权

公开(公告)号：US11997344B2

公开(公告)日：2024-05-28

申请号：US17509401

申请日：2021-10-25

申请人： Rovi Guides, Inc.

发明人： Vijay Kumar , Rajendran Pichaimurthy , Madhusudhan Seetharam

IPC分类号： G06F40/40 , G10L13/027 , G10L13/033 , G10L15/07 , G10L15/19 , G10L25/63 , H04N21/43 , H04N21/81

CPC分类号： H04N21/43072 , G10L13/027 , G10L15/07 , G10L15/19 , G10L25/63 , H04N21/8106

摘要： Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language.

7.

发明公开
SYSTEMS AND METHODS FOR GENERATING SYNTHESIZED SPEECH RESPONSES TO VOICE INPUTS 审中-公开

公开(公告)号：US20240153483A1

公开(公告)日：2024-05-09

申请号：US18387211

申请日：2023-11-06

申请人： ROVI GUIDES, INC.

发明人： Ankur Aher , Jeffry Copps Robert Jose

IPC分类号： G10L13/033 , G10L25/63

CPC分类号： G10L13/0335 , G10L25/63

摘要： The system provides a synthesized speech response to a voice input, based on the prosodic character of the voice input. The system receives the voice input and calculates at least one prosodic metric of the voice input. The at least one prosodic metric can be associated with a word, phrase, grouping thereof, or the entire voice input. The system also determines a response to the voice input, which may include the sequence of words that form the response. The system generates the synthesized speech response, by determining prosodic characteristics based on the response, and on the prosodic character of the voice input. The system outputs the synthesized speech response, which includes a more natural, relevant, or both answer to the call of the voice input. The prosodic character of the voice input and/or response may include pitch, note, duration, prominence, timbre, rate, and rhythm, for example.

8.

发明公开
NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM, SOUND PROCESSING METHOD, AND SOUND PROCESSING SYSTEM 审中-公开

公开(公告)号：US20240135916A1

公开(公告)日：2024-04-25

申请号：US18483570

申请日：2023-10-10

申请人： YAMAHA CORPORATION

发明人： Makoto TACHIBANA

IPC分类号： G10L13/033 , G10L13/047

CPC分类号： G10L13/0335 , G10L13/047

摘要： A non-transitory computer-readable recording medium storing a program that, when executed by a computer system, causes the computer system to perform a method including altering a first portion of first time-series data in accordance with an instruction from a user. The first time-series data indicates a time series of a sound characteristic corresponding to a first pronunciation style of a target sound to be synthesized. The method also includes generating second time-series data when a second pronunciation style different from the first pronunciation style is specified for the target sound. The second time-series data indicates a sound characteristic with the alteration made to the first portion in accordance with the instruction from the user, and indicating a sound characteristic with a second portion other than the first portion corresponding to the second pronunciation style.

9.

发明公开
SYSTEM FOR REPLY GENERATION 审中-公开

公开(公告)号：US20240096236A1

公开(公告)日：2024-03-21

申请号：US18038520

申请日：2021-11-09

申请人： ROLLS-ROYCE PLC

发明人： Stuart Brian MOSS , Muhannad Abdul Rahman ALOMARI , James Frederick Sebastian ARNEY

IPC分类号： G09B21/00 , G06F3/01 , G10L13/033 , G10L15/06 , G10L15/18 , G10L15/22

CPC分类号： G09B21/00 , G06F3/013 , G10L13/033 , G10L15/063 , G10L15/18 , G10L15/22

摘要： A device for generating conversational replies, including a processor with a memory; a speech input module, a user input module; a natural language processing module including one or more encoder-decode modules; the device being configured to: record portions of a conversation through the speech input module, use a speech recognition module to identify words in the conversation, and when one or more words have been recognised: generate one or more responses based on the one or more words using the natural language processing module; selecting a group of the context sensitive responses, prompt the user via the user input module to select a response from the group, output the selected response.

10.

发明授权
Digital assistant voice input integration 有权

公开(公告)号：US11915696B2

公开(公告)日：2024-02-27

申请号：US17379777

申请日：2021-07-19

申请人： Microsoft Technology Licensing, LLC

发明人： Derek Liddell , Francis Zhou , Cheng-Yi Yen

IPC分类号： G10L15/22 , G06F3/16 , G10L13/033 , G10L15/24 , G10L15/26

CPC分类号： G10L15/22 , G06F3/167 , G10L13/033 , G10L2015/223 , G10L2015/227 , G10L2015/228 , G10L15/24 , G10L15/26

摘要： A digital assistant supported on devices such as smartphones, tablets, personal computers, game consoles, etc. includes an extensibility client that exposes an interface and service that enables third party applications to be integrated with the digital assistant so the application user experiences are rendered using the native voice of the digital assistant. Specific voice inputs associated with a given application may be registered by developers using a manifest that is loaded when the application is launched on the device so that voice inputs from the device user can be mapped by the digital assistant extensibility client to the appropriate application as input events for consumption. In typical implementations, the manifest is arranged as a declarative document that streamlines application development and provides a seamless user experience by enabling customization of third party applications to integrate the digital assistant's voice and behaviors within the user experience of the application's domain.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类