-
公开(公告)号:US20240339122A1
公开(公告)日:2024-10-10
申请号:US18608476
申请日:2024-03-18
Applicant: Datum Point Labs Inc.
Inventor: Donghyeon Kim , Bonhwa Ku , Hanseok Ko
IPC: G10L21/007 , G10L15/06 , G10L15/08
CPC classification number: G10L21/007 , G10L15/063 , G10L15/08 , G10L2015/0635
Abstract: Embodiments described herein provide systems and methods for any to any voice conversion. A system receives, via a data interface, a source utterance of a first style and a target utterance of a second style. The system generates, via a first encoder, a vector representation of the target utterance. The system generates, via a second encoder, a vector representation of the source utterance. The system generates, via a filter generator, a generated filter based on the vector representation of the target utterance. The system generates, via a decoder, a generated utterance based on the vector representation of the source utterance and the generated filter.
-
2.
公开(公告)号:US12094459B2
公开(公告)日:2024-09-17
申请号:US17568960
申请日:2022-01-05
Applicant: International Business Machines Corporation
Inventor: Ashish R Mittal , Samarth Bharadwaj , Shreya Khare , Karthik Sankaranarayanan
IPC: G10L15/06 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/187 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10
CPC classification number: G10L15/187 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/063 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10 , G10L2015/0633 , G10L2015/0635 , G10L2015/223
Abstract: Methods, systems, and computer program products for automated domain-specific constrained decoding from speech inputs to structured resources are provided herein. A computer-implemented method includes converting at least a portion of at least one user-provided speech utterance into text by processing the at least one user-provided speech utterance using an artificial intelligence-based automatic speech recognition model; automatically training an artificial intelligence-based decoding engine, wherein automatically training the artificial intelligence-based decoding engine comprising constraining the artificial intelligence-based decoding engine based at least in part on a domain-specific model and the artificial intelligence-based automatic speech recognition model; and generating at least one of one or more domain-specific text outputs related to one or more structured resources associated with the domain and one or more domain-specific action outputs related to the one or more structured resources associated with the domain by processing at least a portion of the text using the artificial intelligence-based decoding engine.
-
公开(公告)号:US12057118B2
公开(公告)日:2024-08-06
申请号:US17441009
申请日:2020-03-09
Applicant: SONY GROUP CORPORATION
Inventor: Tatsuma Sakurai , Ichitaro Kohara
CPC classification number: G10L15/22 , G05D1/0016 , G10L15/063 , G10L15/30 , G10L25/84 , G10L2015/0635 , G10L2015/223
Abstract: Provided is an information processing apparatus including a control section that controls operations of operation bodies in accordance with a result of a voice recognition process. In accordance with a result of a voice recognition process that is based on a voice collected by one of the operation bodies or a voice recognition environment recognized from sensor information collected by one of the operation bodies, the control section controls an operation of another one of the operation bodies.
-
公开(公告)号:US11979437B2
公开(公告)日:2024-05-07
申请号:US18313076
申请日:2023-05-05
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Hojung Lee , Hyeonmok Ko , Hyungrai Oh , Inchul Hwang
IPC: H04L65/1073 , G10L15/06 , G10L15/22 , G10L15/30
CPC classification number: H04L65/1073 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0635 , G10L2015/223
Abstract: A system and method for registering a new device for a voice assistant service. The method, performed by a server, of registering a new device for a voice assistant service includes: comparing functions of a pre-registered device with functions of the new device; identifying functions corresponding to the functions of the pre-registered device among the functions of the new device, based on the comparison; obtaining pre-registered utterance data related to at least some of the identified functions; generating action data for the new device based on the identified functions and the pre-registered utterance data.
-
公开(公告)号:US20240105206A1
公开(公告)日:2024-03-28
申请号:US17934833
申请日:2022-09-23
Applicant: QUALCOMM Incorporated
Inventor: Hesu HUANG , Leonid SHEYNBLAT , Vinesh SUKUMAR , Ziad ASGHAR , Joel LINSKY , Justin MCGLOIN , Tong TANG
CPC classification number: G10L25/60 , G10L15/063 , G10L15/08 , G10L2015/0635 , G10L2015/088
Abstract: Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. Voice data from a first user is received. In response to determining that the voice data includes an utterance of a defined keyword, a user verification score is generated by processing the voice data using a first user verification machine learning (ML) model, and a quality of the voice data is determined. In response to determining that the user verification score and determined quality satisfy one or more defined criteria, a second user verification ML model is updated based on the voice data.
-
公开(公告)号:US11929060B2
公开(公告)日:2024-03-12
申请号:US17170836
申请日:2021-02-08
Applicant: Google LLC
Inventor: Zhehuai Chen , Andrew Rosenberg , Bhuvana Ramabhadran , Pedro Jose Moreno Mengibar
IPC: G10L15/06 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197
CPC classification number: G10L15/063 , G06N3/044 , G06N3/045 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197 , G10L2015/0635
Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model. The method also includes updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair.
-
公开(公告)号:US20240069501A1
公开(公告)日:2024-02-29
申请号:US17823387
申请日:2022-08-30
Inventor: Anoop Cherian Cherian , Sudipta Paul
CPC classification number: G05B13/027 , G10L15/063 , G10L15/16 , G10L15/22 , G10L2015/0635 , G10L2015/223
Abstract: A controller for controlling an entity is provided. The controller comprises a memory to store a hierarchical multimodal reinforcement learning (RL) neural network, and a processor. The hierarchical multimodal RL neural network includes a first level controller and two second level controllers. Each of the second level controllers comprise a first sub level controller relating to a first modality and a second sub level controller relating to a second modality. The first modality is different from the second modality. The processor is configured to select one of the two second level controllers to perform a first sub-task relating to a task, using the first level controller, based on input data and a state of the hierarchical multimodal RL neural network. The selected second level controller is configured to determine a set of control actions to perform the first sub-task, and control the entity based on the set of control actions.
-
公开(公告)号:US11804211B2
公开(公告)日:2023-10-31
申请号:US17112418
申请日:2020-12-04
Applicant: Google LLC
Inventor: Asaf Aharoni , Yaniv Leviathan , Eyal Segalis , Gal Elidan , Sasha Goldshtein , Tomer Amiaz , Deborah Cohen
CPC classification number: G10L15/063 , G06N20/00 , G10L15/02 , G10L15/04 , G10L15/22 , H04L67/133 , H04M3/493 , G10L2015/0635
Abstract: Implementations are directed to providing a voice bot development platform that enables a third-party developer to train a voice bot based on training instance(s). The training instance(s) can each include training input and training output. The training input can include a portion of a corresponding conversation and a prior context of the corresponding conversation. The training output can include a corresponding ground truth response to the portion of the corresponding conversation. Subsequent to training, the voice bot can be deployed for conducting conversations on behalf of a third-party. In some implementations, the voice bot is further trained based on a corresponding feature emphasis input that attentions the voice bot to a particular feature of the portion of the corresponding conversation. In some additional or alternative implementations, the voice bot is further trained to interact with third-party system(s) via remote procedure calls (RPCs).
-
公开(公告)号:US20230274729A1
公开(公告)日:2023-08-31
申请号:US18312587
申请日:2023-05-04
Applicant: Google LLC
Inventor: Olga Kapralova , Evgeny A. Cherepanov , Dmitry Osmakov , Martin Baeuml , Gleb Skobeltsyn
CPC classification number: G10L15/063 , G10L15/06 , G10L15/22 , G10L15/32 , G10L15/01 , G10L15/10 , G10L2015/0635 , G10L2015/0638
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
-
公开(公告)号:US20230259708A1
公开(公告)日:2023-08-17
申请号:US17650876
申请日:2022-02-14
Applicant: ADOBE INC.
Inventor: Amir Pouran Ben Veyseh , Franck Dernoncourt , Walter W. Chang , Trung Huu Bui , Hanieh Deilamsalehy , Seunghyun Yoon , Rajiv Bhawanji Jain , Quan Hung Tran , Varun Manjunatha
IPC: G06F40/289 , G06F40/30 , G10L15/22 , G10L15/06 , G10L15/16
CPC classification number: G06F40/289 , G06F40/30 , G10L15/22 , G10L15/063 , G10L15/16 , G10L2015/0635
Abstract: Systems and methods for key-phrase extraction are described. The systems and methods include receiving a transcript including a text paragraph and generating key-phrase data for the text paragraph using a key-phrase extraction network. The key-phrase extraction network is trained to identify domain-relevant key-phrase data based on domain data obtained using a domain discriminator network. The systems and methods further include generating meta-data for the transcript based on the key-phrase data.
-
-
-
-
-
-
-
-
-