Patent search cpc:"G10L2015/0635" Page 1

1.

发明公开
SYSTEMS AND METHODS FOR ANY TO ANY VOICE CONVERSION 审中-公开

公开(公告)号：US20240339122A1

公开(公告)日：2024-10-10

申请号：US18608476

申请日：2024-03-18

Applicant: Datum Point Labs Inc.

Inventor： Donghyeon Kim , Bonhwa Ku , Hanseok Ko

IPC: G10L21/007 , G10L15/06 , G10L15/08

CPC classification number: G10L21/007 , G10L15/063 , G10L15/08 , G10L2015/0635

Abstract: Embodiments described herein provide systems and methods for any to any voice conversion. A system receives, via a data interface, a source utterance of a first style and a target utterance of a second style. The system generates, via a first encoder, a vector representation of the target utterance. The system generates, via a second encoder, a vector representation of the source utterance. The system generates, via a filter generator, a generated filter based on the vector representation of the target utterance. The system generates, via a decoder, a generated utterance based on the vector representation of the source utterance and the generated filter.

2.

发明授权
Automated domain-specific constrained decoding from speech inputs to structured resources 有权

公开(公告)号：US12094459B2

公开(公告)日：2024-09-17

申请号：US17568960

申请日：2022-01-05

Applicant: International Business Machines Corporation

Inventor： Ashish R Mittal , Samarth Bharadwaj , Shreya Khare , Karthik Sankaranarayanan

IPC: G10L15/06 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/187 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10

CPC classification number: G10L15/187 , G06F40/143 , G06F40/174 , G06N20/00 , G10L15/063 , G10L15/22 , G10L15/30 , G10L19/00 , H04L67/10 , G10L2015/0633 , G10L2015/0635 , G10L2015/223

Abstract: Methods, systems, and computer program products for automated domain-specific constrained decoding from speech inputs to structured resources are provided herein. A computer-implemented method includes converting at least a portion of at least one user-provided speech utterance into text by processing the at least one user-provided speech utterance using an artificial intelligence-based automatic speech recognition model; automatically training an artificial intelligence-based decoding engine, wherein automatically training the artificial intelligence-based decoding engine comprising constraining the artificial intelligence-based decoding engine based at least in part on a domain-specific model and the artificial intelligence-based automatic speech recognition model; and generating at least one of one or more domain-specific text outputs related to one or more structured resources associated with the domain and one or more domain-specific action outputs related to the one or more structured resources associated with the domain by processing at least a portion of the text using the artificial intelligence-based decoding engine.

3.

发明授权
Information processing apparatus and information processing method 有权

公开(公告)号：US12057118B2

公开(公告)日：2024-08-06

申请号：US17441009

申请日：2020-03-09

Applicant: SONY GROUP CORPORATION

Inventor： Tatsuma Sakurai , Ichitaro Kohara

IPC: G10L15/20 , G05D1/00 , G10L15/06 , G10L15/22 , G10L15/30 , G10L25/84

CPC classification number: G10L15/22 , G05D1/0016 , G10L15/063 , G10L15/30 , G10L25/84 , G10L2015/0635 , G10L2015/223

Abstract: Provided is an information processing apparatus including a control section that controls operations of operation bodies in accordance with a result of a voice recognition process. In accordance with a result of a voice recognition process that is based on a voice collected by one of the operation bodies or a voice recognition environment recognized from sensor information collected by one of the operation bodies, the control section controls an operation of another one of the operation bodies.

4.

发明授权
System and method for registering device for voice assistant service 有权

公开(公告)号：US11979437B2

公开(公告)日：2024-05-07

申请号：US18313076

申请日：2023-05-05

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Hojung Lee , Hyeonmok Ko , Hyungrai Oh , Inchul Hwang

IPC: H04L65/1073 , G10L15/06 , G10L15/22 , G10L15/30

CPC classification number: H04L65/1073 , G10L15/063 , G10L15/22 , G10L15/30 , G10L2015/0635 , G10L2015/223

Abstract: A system and method for registering a new device for a voice assistant service. The method, performed by a server, of registering a new device for a voice assistant service includes: comparing functions of a pre-registered device with functions of the new device; identifying functions corresponding to the functions of the pre-registered device among the functions of the new device, based on the comparison; obtaining pre-registered utterance data related to at least some of the identified functions; generating action data for the new device based on the identified functions and the pre-registered utterance data.

5.

发明公开
SEAMLESS CUSTOMIZATION OF MACHINE LEARNING MODELS 审中-公开

公开(公告)号：US20240105206A1

公开(公告)日：2024-03-28

申请号：US17934833

申请日：2022-09-23

Applicant: QUALCOMM Incorporated

Inventor： Hesu HUANG , Leonid SHEYNBLAT , Vinesh SUKUMAR , Ziad ASGHAR , Joel LINSKY , Justin MCGLOIN , Tong TANG

IPC: G10L25/60 , G10L15/06 , G10L15/08

CPC classification number: G10L25/60 , G10L15/063 , G10L15/08 , G10L2015/0635 , G10L2015/088

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. Voice data from a first user is received. In response to determining that the voice data includes an utterance of a defined keyword, a user verification score is generated by processing the voice data using a first user verification machine learning (ML) model, and a quality of the voice data is determined. In response to determining that the user verification score and determined quality satisfy one or more defined criteria, a second user verification ML model is updated based on the voice data.

6.

发明授权
Consistency prediction on streaming sequence models 有权

公开(公告)号：US11929060B2

公开(公告)日：2024-03-12

申请号：US17170836

申请日：2021-02-08

Applicant: Google LLC

Inventor： Zhehuai Chen , Andrew Rosenberg , Bhuvana Ramabhadran , Pedro Jose Moreno Mengibar

IPC: G10L15/06 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197

CPC classification number: G10L15/063 , G06N3/044 , G06N3/045 , G06N3/088 , G10L13/02 , G10L15/16 , G10L15/197 , G10L2015/0635

Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model. The method also includes updating parameters of the speech recognition model based on the consistent loss term determined at each of the plurality of output steps for each training utterance pair.

7.

发明公开
System and Method for Controlling an Entity 审中-公开

公开(公告)号：US20240069501A1

公开(公告)日：2024-02-29

申请号：US17823387

申请日：2022-08-30

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventor： Anoop Cherian Cherian , Sudipta Paul

IPC: G05B13/02 , G10L15/06 , G10L15/16 , G10L15/22

CPC classification number: G05B13/027 , G10L15/063 , G10L15/16 , G10L15/22 , G10L2015/0635 , G10L2015/223

Abstract: A controller for controlling an entity is provided. The controller comprises a memory to store a hierarchical multimodal reinforcement learning (RL) neural network, and a processor. The hierarchical multimodal RL neural network includes a first level controller and two second level controllers. Each of the second level controllers comprise a first sub level controller relating to a first modality and a second sub level controller relating to a second modality. The first modality is different from the second modality. The processor is configured to select one of the two second level controllers to perform a first sub-task relating to a task, using the first level controller, based on input data and a state of the hierarchical multimodal RL neural network. The selected second level controller is configured to determine a set of control actions to perform the first sub-task, and control the entity based on the set of control actions.

8.

发明授权
Example-based voice bot development techniques 有权

公开(公告)号：US11804211B2

公开(公告)日：2023-10-31

申请号：US17112418

申请日：2020-12-04

Applicant: Google LLC

Inventor： Asaf Aharoni , Yaniv Leviathan , Eyal Segalis , Gal Elidan , Sasha Goldshtein , Tomer Amiaz , Deborah Cohen

IPC: G06N20/00 , G10L15/22 , H04M3/493 , G10L15/06 , H04L67/133 , G10L15/02 , G10L15/04

CPC classification number: G10L15/063 , G06N20/00 , G10L15/02 , G10L15/04 , G10L15/22 , H04L67/133 , H04M3/493 , G10L2015/0635

Abstract: Implementations are directed to providing a voice bot development platform that enables a third-party developer to train a voice bot based on training instance(s). The training instance(s) can each include training input and training output. The training input can include a portion of a corresponding conversation and a prior context of the corresponding conversation. The training output can include a corresponding ground truth response to the portion of the corresponding conversation. Subsequent to training, the voice bot can be deployed for conducting conversations on behalf of a third-party. In some implementations, the voice bot is further trained based on a corresponding feature emphasis input that attentions the voice bot to a particular feature of the portion of the corresponding conversation. In some additional or alternative implementations, the voice bot is further trained to interact with third-party system(s) via remote procedure calls (RPCs).

9.

发明公开
ACOUSTIC MODEL TRAINING USING CORRECTED TERMS 审中-公开

公开(公告)号：US20230274729A1

公开(公告)日：2023-08-31

申请号：US18312587

申请日：2023-05-04

Applicant: Google LLC

Inventor： Olga Kapralova , Evgeny A. Cherepanov , Dmitry Osmakov , Martin Baeuml , Gleb Skobeltsyn

IPC: G10L15/06 , G10L15/22 , G10L15/32 , G10L15/01 , G10L15/10

CPC classification number: G10L15/063 , G10L15/06 , G10L15/22 , G10L15/32 , G10L15/01 , G10L15/10 , G10L2015/0635 , G10L2015/0638

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.

10.

发明公开
SYSTEM AND METHODS FOR KEY-PHRASE EXTRACTION 审中-公开

公开(公告)号：US20230259708A1

公开(公告)日：2023-08-17

申请号：US17650876

申请日：2022-02-14

Applicant: ADOBE INC.

Inventor： Amir Pouran Ben Veyseh , Franck Dernoncourt , Walter W. Chang , Trung Huu Bui , Hanieh Deilamsalehy , Seunghyun Yoon , Rajiv Bhawanji Jain , Quan Hung Tran , Varun Manjunatha

IPC: G06F40/289 , G06F40/30 , G10L15/22 , G10L15/06 , G10L15/16

CPC classification number: G06F40/289 , G06F40/30 , G10L15/22 , G10L15/063 , G10L15/16 , G10L2015/0635

Abstract: Systems and methods for key-phrase extraction are described. The systems and methods include receiving a transcript including a text paragraph and generating key-phrase data for the text paragraph using a key-phrase extraction network. The key-phrase extraction network is trained to identify domain-relevant key-phrase data based on domain data obtained using a domain discriminator network. The systems and methods further include generating meta-data for the transcript based on the key-phrase data.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification