Patent search ipc:"G10L21/007" Page 1

1.

发明公开
Systems and Methods for Audio Preparation and Delivery 审中-公开

公开(公告)号：US20240321286A1

公开(公告)日：2024-09-26

申请号：US18189764

申请日：2023-03-24

Applicant: Super Hi-Fi, LLC

Inventor： Brendon Patrick Cassidy , Zack J. Zalon

IPC: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30

CPC classification number: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30

Abstract: The present application relates to systems and methods for audio preparation and delivery. Such systems and methods may involve a controller configured to carry out operations. The operations include receiving source audio comprising a vocal portion. The operations also include selecting, using a trained machine learning model, a primary voice profile based on an analysis of the vocal portion of the received source audio. The primary voice profile is selected from a plurality of predetermined voice profiles. The operations also include adjusting, based on the selected primary voice profile, at least a portion of the source audio. The operations also include providing output audio based on the adjusted portion of source audio.

2.

发明公开
METHOD AND APPARATUS FOR AUTOMATIC SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240289491A1

公开(公告)日：2024-08-29

申请号：US18629401

申请日：2024-04-08

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor： Jisi ZHANG , Md Asif JALAL , Karthikeyan SARAVANAN , Pablo PESO PARADA , Mete OZAY

IPC: G06F21/62 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22 , G10L15/30 , G10L21/007 , G10L21/0208

CPC classification number: G06F21/6254 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L21/007 , G10L21/0208

Abstract: Broadly speaking, the present disclosure relates to a computer-implemented method for training a machine learning, ML, automatic speech recognition, ASR, model. The method comprises injecting a speaker anonymiser, which is configured to cause the ML ASR model to generate anonymised acoustic embeddings for the ML ASR model, at one or more layers of the ML ASR model, and suitably training the ML ASR model including the speaker anonymiser on audio data comprising an utterance with one or more words to be recognised. Correspondingly, there is also described a computer implemented method for performing automatic speech recognition using the trained ML ASR model and system for training/inference thereof.

3.

发明授权
Data anonymization for data labeling and development purposes 有权

公开(公告)号：US12013968B2

公开(公告)日：2024-06-18

申请号：US17076896

申请日：2020-10-22

Applicant: Robert Bosch GmbH

Inventor： Sascha Lange

IPC: G06F21/62 , G06F16/48 , G06T5/70 , G06T7/70 , G06T11/00 , G10L21/007 , G10L21/0232 , G10L25/57

CPC classification number: G06F21/6254 , G06F16/48 , G06T5/70 , G06T7/70 , G06T11/00 , G10L21/007 , G10L21/0232 , G10L25/57 , G06T2207/10016 , G06T2207/30201

Abstract: A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.

4.

发明公开
SIGNAL PROCESSING APPARATUS AND METHOD, TRAINING APPARATUS AND METHOD, AND PROGRAM 审中-公开

公开(公告)号：US20240144945A1

公开(公告)日：2024-05-02

申请号：US18408991

申请日：2024-01-10

Applicant: SONY GROUP CORPORATION

Inventor： Naoya TAKAHASHI

IPC: G10L21/007 , G10L21/013 , G10L21/028

CPC classification number: G10L21/007 , G10L21/013 , G10L21/028

Abstract: Provided is a signal processing apparatus that includes a voice quality conversion unit that converts acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data.

5.

发明公开
Machine Learning for Microphone Style Transfer 审中-公开

公开(公告)号：US20230395087A1

公开(公告)日：2023-12-07

申请号：US18249126

申请日：2021-10-15

Applicant: Google LLC

Inventor： Marco Tagliasacchi , Beat Gfeller , Yunpeng Li , Zalán Borsos

IPC: G10L21/007 , G10L15/06 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21

CPC classification number: G10L21/007 , G10L15/063 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21 , G10L2015/088

Abstract: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.

6.

发明公开
AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM 审中-公开

公开(公告)号：US20230386489A1

公开(公告)日：2023-11-30

申请号：US18032529

申请日：2020-10-23

Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventor： Takuhiro KANEKO , Hirokazu KAMEOKA , Ko TANAKA , Nobukatsu HOJO

IPC: G10L21/007 , G10L25/30 , G06N20/00

CPC classification number: G10L21/007 , G10L25/30 , G06N20/00

Abstract: A voice signal conversion model learning device comprising: a learning data acquisition unit that acquires learning input data which is an input voice signal; and a learning stage conversion unit that executes a conversion learning model which is a model of machine learning including learning stage conversion processing of converting the learning input data into learning stage conversion destination data which is a voice signal of a conversion destination, wherein the learning stage conversion processing includes local feature quantity acquisition processing of acquiring a feature quantity for each learning input-side subset which is a subset of processing target input data having the processing target input data as a population, based on the processing target input data which is data to be processed, the conversion learning model further includes adjustment parameter value acquisition processing of acquiring an adjustment parameter value, which is a value of a parameter for adjusting a statistical value of a distribution of the feature quantity, based on the learning input data, and the learning stage conversion processing converts the learning input data into the learning stage conversion destination data using a result of a predetermined calculation based on the adjustment parameter value.

7.

发明授权
Improving speech recognition with speech synthesis-based model adapation 有权

公开(公告)号：US11823697B2

公开(公告)日：2023-11-21

申请号：US17445537

申请日：2021-08-20

Applicant: Google LLC

Inventor： Andrew Rosenberg , Bhuvana Ramabhadran

IPC: G10L15/26 , G10L21/007 , G06N3/08 , G10L25/30

CPC classification number: G10L21/007 , G06N3/08 , G10L15/26 , G10L25/30

Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

8.

发明申请
AUDIO PROCESSING METHOD, AUDIO PROCESSING APPARATUS AND COMPUTER STORAGE MEDIUM 有权

公开(公告)号：US20230031101A1

公开(公告)日：2023-02-02

申请号：US17587243

申请日：2022-01-28

Applicant: Beijing Xiaomi Mobile Software Co., Ltd.

Inventor： Liujun ZHANG , Yuqing HUA , Zhen YANG , Zuojing LI

IPC: G10L21/007 , G10L21/057

Abstract: An audio processing method applied to a first terminal is described, and includes: in response to receiving of audio data input by a user at the first terminal, and determination that a voice change function is turned on, determining change parameters; and based on the change parameters, performing change processing on the audio data.

9.

发明授权
Device and method for controlling a speaker according to priority data 有权

公开(公告)号：US11380344B2

公开(公告)日：2022-07-05

申请号：US16724840

申请日：2019-12-23

Applicant: MOTOROLA SOLUTIONS, INC.

Inventor： Mark A. Boerger , Sean Regan , Jesus F. Corretjer

IPC: G10L21/007 , G10L19/00 , H04M1/72442

Abstract: A device and method controlling a speaker according to priority data is provided. An audio processor, in communication with a speaker-controlling processor at a device, processes remote audio data, the remote audio data remote to the speaker-controlling processor. The audio processor assigns priority data to the remote audio data. The audio processor provides the remote audio data and the priority data to the speaker-controlling processor. The speaker-controlling processor processes local audio data, the local audio data local to the speaker-controlling processor. The speaker-controlling processor controls a speaker, with respect to the local audio data and the remote audio data, according to the priority data.

10.

发明授权
Facilitating creation and playback of user-recorded audio 有权

公开(公告)号：US11238854B2

公开(公告)日：2022-02-01

申请号：US15378920

申请日：2016-12-14

Applicant: Google LLC

Inventor： Vikram Aggarwal , Barnaby James

IPC: G10L15/02 , G10L15/08 , G10L21/007 , G10L25/51 , G10L15/22

Abstract: Methods, apparatus, and computer readable media are described related to recording, organizing, and making audio files available for consumption by voice-activated products. In various implementations, in response to receiving an input from a first user indicating that the first user intends to record audio content, audio content may be captured and stored. Input may be received from the first user indicating at least one identifier for the audio content. The stored audio content may be associated with the at least one identifier. A voice input may be received from a subsequent user. In response to determining that the voice input has particular characteristics, speech recognition may be biased in respect of the voice input towards recognition of the at least one identifier. In response to recognizing, based on the biased speech recognition, presence of the at least one identifier in the voice input, the stored audio content may be played.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification