-
公开(公告)号:US12073818B2
公开(公告)日:2024-08-27
申请号:US17197740
申请日:2021-03-10
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for receiving feature-based voice data. One or more data augmentation characteristics may be received. One or more augmentations of the feature-based voice data may be generated, via a machine learning model, based upon, at least in part, the feature-based voice data and the one or more data augmentation characteristics.
-
公开(公告)号:US20240212249A1
公开(公告)日:2024-06-27
申请号:US18089487
申请日:2022-12-27
申请人: Metaphysic.AI
发明人: Chris Ume , Jo Plaete , Martin Adams , Thomas Graham
IPC分类号: G06T13/40 , G06N20/00 , G06T13/20 , G06T19/00 , G10L13/033
CPC分类号: G06T13/40 , G06N20/00 , G06T13/205 , G06T19/006 , G10L13/033
摘要: Using latent space manipulation and neural animation to generate hyperreal synthetic faces is described. A machine learning model(s) may be trained to generate a synthetic face of a subject featured in unaltered video content based at least in part on video data of an actor making a mouth-generated sound or a three-dimensional (3D) model of a face of the subject that has been animated in accordance with the mouth-generated sound. Latent space manipulation and neural animation may be used with the trained machine learning model(s) to generate instances of the synthetic face, and the instances of the synthetic face can be used to create altered video content featuring the subject with the synthetic face making the mouth-generated sound.
-
公开(公告)号:US12014722B2
公开(公告)日:2024-06-18
申请号:US17197587
申请日:2021-03-10
IPC分类号: G10L13/02 , G06F3/16 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/06 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/00
CPC分类号: G10L13/02 , G06F3/165 , G06N5/02 , G06N20/00 , G10K15/08 , G10L13/033 , G10L15/02 , G10L15/063 , G10L15/065 , G10L21/0224 , G10L25/03 , H04S7/30 , H04S7/302 , H04S7/303
摘要: A method, computer program product, and computing system for receiving feature-based voice data associated with a first acoustic domain. One or more gain-based augmentations may be performed on at least a portion of the feature-based voice data, thus defining gain-augmented feature-based voice data.
-
公开(公告)号:US12002470B1
公开(公告)日:2024-06-04
申请号:US18401544
申请日:2023-12-31
申请人: Theai, Inc.
发明人: Ilya Gelfenbeyn , Mikhail Ermolenko , Kylan Gibbs , Kirill Ryzhov , Nathan Yu
IPC分类号: G10L15/00 , G06F16/332 , G10L13/033 , G10L15/183 , G10L15/22 , G10L15/30 , G06F40/30 , G10L15/18
CPC分类号: G10L15/22 , G06F16/3329 , G10L13/033 , G10L15/183 , G10L15/30 , G06F40/30 , G10L15/1822
摘要: Systems and methods for providing multi-source based knowledge data for Artificial Intelligence (AI) characters are provided. An example method includes providing a plurality of data sources; receiving, from a user, at least one word during a conversation between the user and an AI character; ascertaining a speech style of the AI character; analyzing the at least one word to determine a type of information needed to generate a reply to the user; selecting, based on the type of information, at least one data source from the plurality of data sources; generating, based on the at least one word, one or more queries; sending the one or more queries to the at least one data source; receiving one or more responses from the at least one data source; forming, based on the one or more responses and the speech style of the AI character, the reply for providing to the user.
-
公开(公告)号:US20240096236A1
公开(公告)日:2024-03-21
申请号:US18038520
申请日:2021-11-09
申请人: ROLLS-ROYCE PLC
CPC分类号: G09B21/00 , G06F3/013 , G10L13/033 , G10L15/063 , G10L15/18 , G10L15/22
摘要: A device for generating conversational replies, including a processor with a memory; a speech input module, a user input module; a natural language processing module including one or more encoder-decode modules; the device being configured to: record portions of a conversation through the speech input module, use a speech recognition module to identify words in the conversation, and when one or more words have been recognised: generate one or more responses based on the one or more words using the natural language processing module; selecting a group of the context sensitive responses, prompt the user via the user input module to select a response from the group, output the selected response.
-
公开(公告)号:US11915696B2
公开(公告)日:2024-02-27
申请号:US17379777
申请日:2021-07-19
发明人: Derek Liddell , Francis Zhou , Cheng-Yi Yen
IPC分类号: G10L15/22 , G06F3/16 , G10L13/033 , G10L15/24 , G10L15/26
CPC分类号: G10L15/22 , G06F3/167 , G10L13/033 , G10L2015/223 , G10L2015/227 , G10L2015/228 , G10L15/24 , G10L15/26
摘要: A digital assistant supported on devices such as smartphones, tablets, personal computers, game consoles, etc. includes an extensibility client that exposes an interface and service that enables third party applications to be integrated with the digital assistant so the application user experiences are rendered using the native voice of the digital assistant. Specific voice inputs associated with a given application may be registered by developers using a manifest that is loaded when the application is launched on the device so that voice inputs from the device user can be mapped by the digital assistant extensibility client to the appropriate application as input events for consumption. In typical implementations, the manifest is arranged as a declarative document that streamlines application development and provides a seamless user experience by enabling customization of third party applications to integrate the digital assistant's voice and behaviors within the user experience of the application's domain.
-
7.
公开(公告)号:US11908451B2
公开(公告)日:2024-02-20
申请号:US18024021
申请日:2021-08-09
发明人: Congyi Wang , Yu Chen , Jinxiang Chai
IPC分类号: G10L13/10 , G06T13/00 , G10L13/033 , G10L13/047 , G10L15/02 , G10L15/26
CPC分类号: G10L13/10 , G06T13/00 , G10L13/033 , G10L13/047 , G10L15/02 , G10L15/26 , G10L2013/105
摘要: A text-based virtual object animation generation includes acquiring text information, where the text information includes an original text of a virtual object animation to be generated; analyzing an emotional feature of the text information; performing speech synthesis according to the emotional feature, a rhyme boundary, and the text information to obtain audio information, where the audio information includes emotional speech obtained by conversion based on the original text; and generating a corresponding virtual object animation based on the text information and the audio information, where the virtual object animation is synchronized in time with the audio information.
-
公开(公告)号:US11900938B2
公开(公告)日:2024-02-13
申请号:US17867161
申请日:2022-07-18
申请人: Google LLC
IPC分类号: G10L15/00 , G10L15/22 , G10L17/22 , H04L67/104 , G10L15/26 , G10L13/00 , G06F16/332 , G10L15/18 , G10L13/033 , G10L15/30 , G10L13/08 , G06F21/62
CPC分类号: G10L15/22 , G06F16/3329 , G06F21/6245 , G10L13/00 , G10L13/033 , G10L13/08 , G10L15/1815 , G10L15/1822 , G10L15/26 , G10L15/30 , G10L17/22 , H04L67/104 , G10L2015/223 , G10L2015/228
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.
-
公开(公告)号:US20240029706A1
公开(公告)日:2024-01-25
申请号:US18479785
申请日:2023-10-02
申请人: Google LLC
IPC分类号: G10L13/033 , G06F3/16 , G10L13/10
CPC分类号: G10L13/033 , G06F3/167 , G10L13/10 , G10L2021/0135
摘要: A device may identify a plurality of sources for outputs that the device is configured to provide. The plurality of sources may include at least one of a particular application in the device, an operating system of the device, a particular area within a display of the device, or a particular graphical user interface object. The device may also assign a set of distinct voices to respective sources of the plurality of sources. The device may also receive a request for speech output. The device may also select a particular source that is associated with the requested speech output. The device may also generate speech having particular voice characteristics of a particular voice assigned to the particular source.
-
公开(公告)号:US11817078B2
公开(公告)日:2023-11-14
申请号:US18328189
申请日:2023-06-02
申请人: Vocollect, Inc.
发明人: James Hendrickson , Debra Drylie Stiffey , Duane Littleton , John Pecorari , Arkadiusz Slusarczyk
IPC分类号: G10L13/00 , G10L13/02 , G10L13/033
CPC分类号: G10L13/02 , G10L13/033
摘要: A method and apparatus that dynamically adjust operational parameters of a text-to-speech engine in a speech-based system are disclosed. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.
-
-
-
-
-
-
-
-
-