-
公开(公告)号:US20240321286A1
公开(公告)日:2024-09-26
申请号:US18189764
申请日:2023-03-24
Applicant: Super Hi-Fi, LLC
Inventor: Brendon Patrick Cassidy , Zack J. Zalon
IPC: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30
CPC classification number: G10L21/007 , G10L17/02 , G10L21/028 , G10L25/18 , G10L25/30
Abstract: The present application relates to systems and methods for audio preparation and delivery. Such systems and methods may involve a controller configured to carry out operations. The operations include receiving source audio comprising a vocal portion. The operations also include selecting, using a trained machine learning model, a primary voice profile based on an analysis of the vocal portion of the received source audio. The primary voice profile is selected from a plurality of predetermined voice profiles. The operations also include adjusting, based on the selected primary voice profile, at least a portion of the source audio. The operations also include providing output audio based on the adjusted portion of source audio.
-
公开(公告)号:US20240289491A1
公开(公告)日:2024-08-29
申请号:US18629401
申请日:2024-04-08
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jisi ZHANG , Md Asif JALAL , Karthikeyan SARAVANAN , Pablo PESO PARADA , Mete OZAY
IPC: G06F21/62 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/22 , G10L15/30 , G10L21/007 , G10L21/0208
CPC classification number: G06F21/6254 , G10L15/02 , G10L15/063 , G10L15/16 , G10L15/22 , G10L15/30 , G10L21/007 , G10L21/0208
Abstract: Broadly speaking, the present disclosure relates to a computer-implemented method for training a machine learning, ML, automatic speech recognition, ASR, model. The method comprises injecting a speaker anonymiser, which is configured to cause the ML ASR model to generate anonymised acoustic embeddings for the ML ASR model, at one or more layers of the ML ASR model, and suitably training the ML ASR model including the speaker anonymiser on audio data comprising an utterance with one or more words to be recognised. Correspondingly, there is also described a computer implemented method for performing automatic speech recognition using the trained ML ASR model and system for training/inference thereof.
-
公开(公告)号:US12013968B2
公开(公告)日:2024-06-18
申请号:US17076896
申请日:2020-10-22
Applicant: Robert Bosch GmbH
Inventor: Sascha Lange
IPC: G06F21/62 , G06F16/48 , G06T5/70 , G06T7/70 , G06T11/00 , G10L21/007 , G10L21/0232 , G10L25/57
CPC classification number: G06F21/6254 , G06F16/48 , G06T5/70 , G06T7/70 , G06T11/00 , G10L21/007 , G10L21/0232 , G10L25/57 , G06T2207/10016 , G06T2207/30201
Abstract: A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.
-
公开(公告)号:US20240144945A1
公开(公告)日:2024-05-02
申请号:US18408991
申请日:2024-01-10
Applicant: SONY GROUP CORPORATION
Inventor: Naoya TAKAHASHI
IPC: G10L21/007 , G10L21/013 , G10L21/028
CPC classification number: G10L21/007 , G10L21/013 , G10L21/028
Abstract: Provided is a signal processing apparatus that includes a voice quality conversion unit that converts acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data.
-
公开(公告)号:US20230395087A1
公开(公告)日:2023-12-07
申请号:US18249126
申请日:2021-10-15
Applicant: Google LLC
Inventor: Marco Tagliasacchi , Beat Gfeller , Yunpeng Li , Zalán Borsos
IPC: G10L21/007 , G10L15/06 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21
CPC classification number: G10L21/007 , G10L15/063 , G10L15/08 , G10L25/18 , G10L21/0208 , G10L25/21 , G10L2015/088
Abstract: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.
-
公开(公告)号:US20230386489A1
公开(公告)日:2023-11-30
申请号:US18032529
申请日:2020-10-23
Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
Inventor: Takuhiro KANEKO , Hirokazu KAMEOKA , Ko TANAKA , Nobukatsu HOJO
IPC: G10L21/007 , G10L25/30 , G06N20/00
CPC classification number: G10L21/007 , G10L25/30 , G06N20/00
Abstract: A voice signal conversion model learning device comprising: a learning data acquisition unit that acquires learning input data which is an input voice signal; and a learning stage conversion unit that executes a conversion learning model which is a model of machine learning including learning stage conversion processing of converting the learning input data into learning stage conversion destination data which is a voice signal of a conversion destination, wherein the learning stage conversion processing includes local feature quantity acquisition processing of acquiring a feature quantity for each learning input-side subset which is a subset of processing target input data having the processing target input data as a population, based on the processing target input data which is data to be processed, the conversion learning model further includes adjustment parameter value acquisition processing of acquiring an adjustment parameter value, which is a value of a parameter for adjusting a statistical value of a distribution of the feature quantity, based on the learning input data, and the learning stage conversion processing converts the learning input data into the learning stage conversion destination data using a result of a predetermined calculation based on the adjustment parameter value.
-
公开(公告)号:US11823697B2
公开(公告)日:2023-11-21
申请号:US17445537
申请日:2021-08-20
Applicant: Google LLC
Inventor: Andrew Rosenberg , Bhuvana Ramabhadran
IPC: G10L15/26 , G10L21/007 , G06N3/08 , G10L25/30
CPC classification number: G10L21/007 , G06N3/08 , G10L15/26 , G10L25/30
Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.
-
公开(公告)号:US20230031101A1
公开(公告)日:2023-02-02
申请号:US17587243
申请日:2022-01-28
Applicant: Beijing Xiaomi Mobile Software Co., Ltd.
Inventor: Liujun ZHANG , Yuqing HUA , Zhen YANG , Zuojing LI
IPC: G10L21/007 , G10L21/057
Abstract: An audio processing method applied to a first terminal is described, and includes: in response to receiving of audio data input by a user at the first terminal, and determination that a voice change function is turned on, determining change parameters; and based on the change parameters, performing change processing on the audio data.
-
公开(公告)号:US11380344B2
公开(公告)日:2022-07-05
申请号:US16724840
申请日:2019-12-23
Applicant: MOTOROLA SOLUTIONS, INC.
Inventor: Mark A. Boerger , Sean Regan , Jesus F. Corretjer
IPC: G10L21/007 , G10L19/00 , H04M1/72442
Abstract: A device and method controlling a speaker according to priority data is provided. An audio processor, in communication with a speaker-controlling processor at a device, processes remote audio data, the remote audio data remote to the speaker-controlling processor. The audio processor assigns priority data to the remote audio data. The audio processor provides the remote audio data and the priority data to the speaker-controlling processor. The speaker-controlling processor processes local audio data, the local audio data local to the speaker-controlling processor. The speaker-controlling processor controls a speaker, with respect to the local audio data and the remote audio data, according to the priority data.
-
公开(公告)号:US11238854B2
公开(公告)日:2022-02-01
申请号:US15378920
申请日:2016-12-14
Applicant: Google LLC
Inventor: Vikram Aggarwal , Barnaby James
IPC: G10L15/02 , G10L15/08 , G10L21/007 , G10L25/51 , G10L15/22
Abstract: Methods, apparatus, and computer readable media are described related to recording, organizing, and making audio files available for consumption by voice-activated products. In various implementations, in response to receiving an input from a first user indicating that the first user intends to record audio content, audio content may be captured and stored. Input may be received from the first user indicating at least one identifier for the audio content. The stored audio content may be associated with the at least one identifier. A voice input may be received from a subsequent user. In response to determining that the voice input has particular characteristics, speech recognition may be biased in respect of the voice input towards recognition of the at least one identifier. In response to recognizing, based on the biased speech recognition, presence of the at least one identifier in the voice input, the stored audio content may be played.
-
-
-
-
-
-
-
-
-