-
公开(公告)号:US12073818B2
公开(公告)日:2024-08-27
申请号:US17197740
申请日:2021-03-10
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for receiving feature-based voice data. One or more data augmentation characteristics may be received. One or more augmentations of the feature-based voice data may be generated, via a machine learning model, based upon, at least in part, the feature-based voice data and the one or more data augmentation characteristics.
-
公开(公告)号:US20240273793A1
公开(公告)日:2024-08-15
申请号:US18441461
申请日:2024-02-14
IPC分类号: G06T11/60 , G06F3/0484 , G06F3/0488 , G06F40/109 , G06F40/166 , G06F40/47 , G09B5/06 , G10L13/033
CPC分类号: G06T11/60 , G06F3/0484 , G06F3/0488 , G06F40/109 , G06F40/166 , G06F40/47 , G09B5/06 , G10L13/0335 , G06T2200/24
摘要: According to an aspect of the present invention, there is provided a computer graphics processing and selective visual display system, comprising: a computer graphics processing and selective visual display system with a screen; an eye tracking device; a processor; one or more computer memory devices; wherein the processor is arranged for operations comprising: measuring the user's eye movements to ascertain the specific word on which the user is fixated, by the eye tracking device; modifying the display at the user's current fixation point; applying a delay between the presentation of successive graphic elements based on the user's calculated rate to accommodate the user's required time; and presenting elements to the user at a rate based upon the user's required time.
-
公开(公告)号:US20240212249A1
公开(公告)日:2024-06-27
申请号:US18089487
申请日:2022-12-27
申请人: Metaphysic.AI
发明人: Chris Ume , Jo Plaete , Martin Adams , Thomas Graham
IPC分类号: G06T13/40 , G06N20/00 , G06T13/20 , G06T19/00 , G10L13/033
CPC分类号: G06T13/40 , G06N20/00 , G06T13/205 , G06T19/006 , G10L13/033
摘要: Using latent space manipulation and neural animation to generate hyperreal synthetic faces is described. A machine learning model(s) may be trained to generate a synthetic face of a subject featured in unaltered video content based at least in part on video data of an actor making a mouth-generated sound or a three-dimensional (3D) model of a face of the subject that has been animated in accordance with the mouth-generated sound. Latent space manipulation and neural animation may be used with the trained machine learning model(s) to generate instances of the synthetic face, and the instances of the synthetic face can be used to create altered video content featuring the subject with the synthetic face making the mouth-generated sound.
-
公开(公告)号:US12014722B2
公开(公告)日:2024-06-18
申请号:US17197587
申请日:2021-03-10
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for receiving feature-based voice data associated with a first acoustic domain. One or more gain-based augmentations may be performed on at least a portion of the feature-based voice data, thus defining gain-augmented feature-based voice data.
-
公开(公告)号:US12002470B1
公开(公告)日:2024-06-04
申请号:US18401544
申请日:2023-12-31
申请人: Theai, Inc.
发明人: Ilya Gelfenbeyn , Mikhail Ermolenko , Kylan Gibbs , Kirill Ryzhov , Nathan Yu
IPC分类号: G10L15/00 , G06F16/332 , G10L13/033 , G10L15/183 , G10L15/22 , G10L15/30 , G06F40/30 , G10L15/18
CPC分类号: G10L15/22 , G06F16/3329 , G10L13/033 , G10L15/183 , G10L15/30 , G06F40/30 , G10L15/1822
摘要: Systems and methods for providing multi-source based knowledge data for Artificial Intelligence (AI) characters are provided. An example method includes providing a plurality of data sources; receiving, from a user, at least one word during a conversation between the user and an AI character; ascertaining a speech style of the AI character; analyzing the at least one word to determine a type of information needed to generate a reply to the user; selecting, based on the type of information, at least one data source from the plurality of data sources; generating, based on the at least one word, one or more queries; sending the one or more queries to the at least one data source; receiving one or more responses from the at least one data source; forming, based on the one or more responses and the speech style of the AI character, the reply for providing to the user.
-
公开(公告)号:US11997344B2
公开(公告)日:2024-05-28
申请号:US17509401
申请日:2021-10-25
申请人: Rovi Guides, Inc.
IPC分类号: G06F40/40 , G10L13/027 , G10L13/033 , G10L15/07 , G10L15/19 , G10L25/63 , H04N21/43 , H04N21/81
CPC分类号: H04N21/43072 , G10L13/027 , G10L15/07 , G10L15/19 , G10L25/63 , H04N21/8106
摘要: Systems and methods are described herein for generating alternate audio for a media stream. The media system receives media that is requested by the user. The media comprises a video and audio. The audio includes words spoken in a first language. The media system stores the received media in a buffer as it is received. The media system separates the audio from the buffered media and determines an emotional state expressed by spoken words of the first language. The media system translates the words spoken in the first language into words spoken in a second language. Using the translated words of the second language, the media system synthesizes speech having the emotional state previously determined. The media system then retrieves the video of the received media from the buffer and synchronizes the synthesized speech with the video to generate the media content in a second language.
-
公开(公告)号:US20240153483A1
公开(公告)日:2024-05-09
申请号:US18387211
申请日:2023-11-06
申请人: ROVI GUIDES, INC.
IPC分类号: G10L13/033 , G10L25/63
CPC分类号: G10L13/0335 , G10L25/63
摘要: The system provides a synthesized speech response to a voice input, based on the prosodic character of the voice input. The system receives the voice input and calculates at least one prosodic metric of the voice input. The at least one prosodic metric can be associated with a word, phrase, grouping thereof, or the entire voice input. The system also determines a response to the voice input, which may include the sequence of words that form the response. The system generates the synthesized speech response, by determining prosodic characteristics based on the response, and on the prosodic character of the voice input. The system outputs the synthesized speech response, which includes a more natural, relevant, or both answer to the call of the voice input. The prosodic character of the voice input and/or response may include pitch, note, duration, prominence, timbre, rate, and rhythm, for example.
-
8.
公开(公告)号:US20240135916A1
公开(公告)日:2024-04-25
申请号:US18483570
申请日:2023-10-10
申请人: YAMAHA CORPORATION
发明人: Makoto TACHIBANA
IPC分类号: G10L13/033 , G10L13/047
CPC分类号: G10L13/0335 , G10L13/047
摘要: A non-transitory computer-readable recording medium storing a program that, when executed by a computer system, causes the computer system to perform a method including altering a first portion of first time-series data in accordance with an instruction from a user. The first time-series data indicates a time series of a sound characteristic corresponding to a first pronunciation style of a target sound to be synthesized. The method also includes generating second time-series data when a second pronunciation style different from the first pronunciation style is specified for the target sound. The second time-series data indicates a sound characteristic with the alteration made to the first portion in accordance with the instruction from the user, and indicating a sound characteristic with a second portion other than the first portion corresponding to the second pronunciation style.
-
公开(公告)号:US20240096236A1
公开(公告)日:2024-03-21
申请号:US18038520
申请日:2021-11-09
申请人: ROLLS-ROYCE PLC
CPC分类号: G09B21/00 , G06F3/013 , G10L13/033 , G10L15/063 , G10L15/18 , G10L15/22
摘要: A device for generating conversational replies, including a processor with a memory; a speech input module, a user input module; a natural language processing module including one or more encoder-decode modules; the device being configured to: record portions of a conversation through the speech input module, use a speech recognition module to identify words in the conversation, and when one or more words have been recognised: generate one or more responses based on the one or more words using the natural language processing module; selecting a group of the context sensitive responses, prompt the user via the user input module to select a response from the group, output the selected response.
-
公开(公告)号:US11915696B2
公开(公告)日:2024-02-27
申请号:US17379777
申请日:2021-07-19
发明人: Derek Liddell , Francis Zhou , Cheng-Yi Yen
IPC分类号: G10L15/22 , G06F3/16 , G10L13/033 , G10L15/24 , G10L15/26
CPC分类号: G10L15/22 , G06F3/167 , G10L13/033 , G10L2015/223 , G10L2015/227 , G10L2015/228 , G10L15/24 , G10L15/26
摘要: A digital assistant supported on devices such as smartphones, tablets, personal computers, game consoles, etc. includes an extensibility client that exposes an interface and service that enables third party applications to be integrated with the digital assistant so the application user experiences are rendered using the native voice of the digital assistant. Specific voice inputs associated with a given application may be registered by developers using a manifest that is loaded when the application is launched on the device so that voice inputs from the device user can be mapped by the digital assistant extensibility client to the appropriate application as input events for consumption. In typical implementations, the manifest is arranged as a declarative document that streamlines application development and provides a seamless user experience by enabling customization of third party applications to integrate the digital assistant's voice and behaviors within the user experience of the application's domain.
-
-
-
-
-
-
-
-
-