Systems and Methods for Audio Preparation and Delivery

    公开(公告)号:US20240321286A1

    公开(公告)日:2024-09-26

    申请号:US18189764

    申请日:2023-03-24

    CPC classification number: G10L21/007 G10L17/02 G10L21/028 G10L25/18 G10L25/30

    Abstract: The present application relates to systems and methods for audio preparation and delivery. Such systems and methods may involve a controller configured to carry out operations. The operations include receiving source audio comprising a vocal portion. The operations also include selecting, using a trained machine learning model, a primary voice profile based on an analysis of the vocal portion of the received source audio. The primary voice profile is selected from a plurality of predetermined voice profiles. The operations also include adjusting, based on the selected primary voice profile, at least a portion of the source audio. The operations also include providing output audio based on the adjusted portion of source audio.

    AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM

    公开(公告)号:US20230386489A1

    公开(公告)日:2023-11-30

    申请号:US18032529

    申请日:2020-10-23

    CPC classification number: G10L21/007 G10L25/30 G06N20/00

    Abstract: A voice signal conversion model learning device comprising: a learning data acquisition unit that acquires learning input data which is an input voice signal; and a learning stage conversion unit that executes a conversion learning model which is a model of machine learning including learning stage conversion processing of converting the learning input data into learning stage conversion destination data which is a voice signal of a conversion destination, wherein the learning stage conversion processing includes local feature quantity acquisition processing of acquiring a feature quantity for each learning input-side subset which is a subset of processing target input data having the processing target input data as a population, based on the processing target input data which is data to be processed, the conversion learning model further includes adjustment parameter value acquisition processing of acquiring an adjustment parameter value, which is a value of a parameter for adjusting a statistical value of a distribution of the feature quantity, based on the learning input data, and the learning stage conversion processing converts the learning input data into the learning stage conversion destination data using a result of a predetermined calculation based on the adjustment parameter value.

    Improving speech recognition with speech synthesis-based model adapation

    公开(公告)号:US11823697B2

    公开(公告)日:2023-11-21

    申请号:US17445537

    申请日:2021-08-20

    Applicant: Google LLC

    CPC classification number: G10L21/007 G06N3/08 G10L15/26 G10L25/30

    Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

    Device and method for controlling a speaker according to priority data

    公开(公告)号:US11380344B2

    公开(公告)日:2022-07-05

    申请号:US16724840

    申请日:2019-12-23

    Abstract: A device and method controlling a speaker according to priority data is provided. An audio processor, in communication with a speaker-controlling processor at a device, processes remote audio data, the remote audio data remote to the speaker-controlling processor. The audio processor assigns priority data to the remote audio data. The audio processor provides the remote audio data and the priority data to the speaker-controlling processor. The speaker-controlling processor processes local audio data, the local audio data local to the speaker-controlling processor. The speaker-controlling processor controls a speaker, with respect to the local audio data and the remote audio data, according to the priority data.

    Facilitating creation and playback of user-recorded audio

    公开(公告)号:US11238854B2

    公开(公告)日:2022-02-01

    申请号:US15378920

    申请日:2016-12-14

    Applicant: Google LLC

    Abstract: Methods, apparatus, and computer readable media are described related to recording, organizing, and making audio files available for consumption by voice-activated products. In various implementations, in response to receiving an input from a first user indicating that the first user intends to record audio content, audio content may be captured and stored. Input may be received from the first user indicating at least one identifier for the audio content. The stored audio content may be associated with the at least one identifier. A voice input may be received from a subsequent user. In response to determining that the voice input has particular characteristics, speech recognition may be biased in respect of the voice input towards recognition of the at least one identifier. In response to recognizing, based on the biased speech recognition, presence of the at least one identifier in the voice input, the stored audio content may be played.

Patent Agency Ranking