Machine learning to generate music from text

    公开(公告)号:US10380983B2

    公开(公告)日:2019-08-13

    申请号:US15394895

    申请日:2016-12-30

    申请人: Google Inc.

    发明人: Dominik Roblek

    摘要: The present disclosure provides systems and methods that leverage one or more machine-learned models to generate music from text. In particular, a computing system can include a music generation model that is operable to extract one or more structural features from an input text. The one or more structural features can be indicative of a structure associated with the input text. The music generation model can generate a musical composition from the input text based at least in part on the one or more structural features. For example, the music generation model can generate a musical composition that exhibits a musical structure that mimics or otherwise corresponds to the structure associated with the input text. For example, the music generation model can include a machine-learned audio generation model. In such fashion, the systems and methods of the present disclosure can generate music that exhibits a globally consistent theme and/or structure.

    Incentive-based check-in
    2.
    发明授权

    公开(公告)号:US10242378B1

    公开(公告)日:2019-03-26

    申请号:US14929105

    申请日:2015-10-30

    申请人: Google Inc.

    摘要: Apparatus, systems and methods provide incentive-based usage of an audio recognition system. In an aspect, a system is provided that includes a query component configured to receive an audio sample from a device and a recognition component configured to determine an identification of the audio sample. The system further includes a reward component configured to identify a reward associated with the identification of the audio sample, wherein the query component is further configured to provide a query result to the device, the query result comprising the identification of the audio sample and the reward associated therewith.

    FREQUENCY BASED AUDIO ANALYSIS USING NEURAL NETWORKS

    公开(公告)号:US20170330586A1

    公开(公告)日:2017-11-16

    申请号:US15151362

    申请日:2016-05-10

    申请人: Google Inc.

    IPC分类号: G10L25/30 G06F11/07 G06N3/08

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for frequency based audio analysis using neural networks. One of the methods includes training a neural network that includes a plurality of neural network layers on training data, wherein the neural network is configured to receive frequency domain features of an audio sample and to process the frequency domain features to generate a neural network output for the audio sample, wherein the neural network comprises (i) a convolutional layer that is configured to map frequency domain features to logarithmic scaled frequency domain features, wherein the convolutional layer comprises one or more convolutional layer filters, and (ii) one or more other neural network layers having respective layer parameters that are configured to process the logarithmic scaled frequency domain features to generate the neural network output.

    Dual model speaker identification

    公开(公告)号:US09711148B1

    公开(公告)日:2017-07-18

    申请号:US13944975

    申请日:2013-07-18

    申请人: Google Inc.

    IPC分类号: G10L17/02

    CPC分类号: G10L17/02 G10L17/10 G10L17/22

    摘要: A processing system receives an audio signal encoding an utterance and determines that a first portion of the audio signal corresponds to a predefined phrase. The processing system accesses one or more text-dependent models associated with the predefined phrase and determines a first confidence based on the one or more text-dependent models associated with the predefined phrase, the first confidence corresponding to a first likelihood that a particular speaker spoke the utterance. The processing system determines a second confidence for a second portion of the audio signal using one or more text-independent models, the second confidence corresponding to a second likelihood that the particular speaker spoke the utterance. The processing system then determines that the particular speaker spoke the utterance based at least in part on the first confidence and the second confidence.

    Text-dependent speaker identification
    6.
    发明授权
    Text-dependent speaker identification 有权
    文字相关的扬声器识别

    公开(公告)号:US09542948B2

    公开(公告)日:2017-01-10

    申请号:US14612830

    申请日:2015-02-03

    申请人: Google Inc.

    IPC分类号: G10L15/00 G10L17/18 G10L17/00

    CPC分类号: G10L17/18 G10L17/005

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker verification. The methods, systems, and apparatus include actions of inputting speech data that corresponds to a particular utterance to a first neural network and determining an evaluation vector based on output at a hidden layer of the first neural network. Additional actions include obtaining a reference vector that corresponds to a past utterance of a particular speaker. Further actions include inputting the evaluation vector and the reference vector to a second neural network that is trained on a set of labeled pairs of feature vectors to identify whether speakers associated with the labeled pairs of feature vectors are the same speaker. More actions include determining, based on an output of the second neural network, whether the particular utterance was likely spoken by the particular speaker.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的用于说话者验证的计算机程序。 方法,系统和装置包括将对应于特定话语的语音数据输入到第一神经网络并基于第一神经网络的隐藏层处的输出来确定评估向量的动作。 附加动作包括获得对应于特定说话者的过去话语的参考矢量。 进一步的动作包括将评估向量和参考矢量输入到第二神经网络,该第二神经网络被训练在一组标记的特征矢量对上,以识别与标记的特征矢量对相关联的扬声器是否是相同的扬声器。 更多的动作包括基于第二神经网络的输出确定特定话语是否可能由特定说话者说出。

    Audio Data Classification
    7.
    发明申请
    Audio Data Classification 审中-公开
    音频数据分类

    公开(公告)号:US20160322066A1

    公开(公告)日:2016-11-03

    申请号:US13932198

    申请日:2013-07-01

    申请人: Google Inc.

    IPC分类号: G10L25/81

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing an audio sample to determine whether the audio sample includes music audio data. One or more detectors, including a spectral fluctuation detector, a peak repetition detector, and a beat pitch detector, may analyze the audio sample and generate a score that represents whether the audio sample includes music audio data. One or more of the scores may be combined to determine whether the audio sample includes music audio data or non-music audio data.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于分析音频样本以确定音频样本是否包括音乐音频数据。 一个或多个检测器,包括光谱波动检测器,峰值重复检测器和拍频间隔检测器,可以分析音频样本并产生表示音频样本是否包括音乐音频数据的得分。 可以组合一个或多个分数以确定音频样本是否包括音乐音频数据或非音乐音频数据。

    Reference signal suppression in speech recognition
    8.
    发明授权
    Reference signal suppression in speech recognition 有权
    语音识别中的参考信号抑制

    公开(公告)号:US09240183B2

    公开(公告)日:2016-01-19

    申请号:US14181374

    申请日:2014-02-14

    申请人: Google Inc.

    摘要: The technology described herein can be embodied in a method that includes receiving a first signal representing an output of a speaker device, and a second signal comprising the output of the speaker device, and an audio signal corresponding to an utterance of a speaker. The method includes aligning one or more segments of the first signal with one or more segments of the second signal. Acoustic features of the one or more segments of the first and second signals are classified to obtain a first set of vectors and a second set of vectors, respectively, the vectors being associated with speech units. The second set is modified using the first set, such that the modified second set represents a suppression of the output of the speaker device in the second signal. A transcription of the utterance of the speaker can be generated from the modified second set of vectors.

    摘要翻译: 本文描述的技术可以以包括接收表示扬声器装置的输出的第一信号和包括扬声器装置的输出的第二信号以及对应于说话者发声的音频信号的方法来实现。 该方法包括将第一信号的一个或多个段对准第二信号的一个或多个段。 第一和第二信号的一个或多个段的声学特征被分类以分别获得与语音单元相关联的向量的第一组向量和第二组向量。 使用第一组修改第二组,使得修改的第二组表示抑制第二信号中的扬声器设备的输出。 可以从修改的第二组向量生成说话者的话语的转录。

    REFERENCE SIGNAL SUPPRESSION IN SPEECH RECOGNITION
    9.
    发明申请
    REFERENCE SIGNAL SUPPRESSION IN SPEECH RECOGNITION 有权
    语音识别中的参考信号抑制

    公开(公告)号:US20150235651A1

    公开(公告)日:2015-08-20

    申请号:US14181374

    申请日:2014-02-14

    申请人: Google Inc.

    摘要: The technology described herein can be embodied in a method that includes receiving a first signal representing an output of a speaker device, and a second signal comprising the output of the speaker device, and an audio signal corresponding to an utterance of a speaker. The method includes aligning one or more segments of the first signal with one or more segments of the second signal. Acoustic features of the one or more segments of the first and second signals are classified to obtain a first set of vectors and a second set of vectors, respectively, the vectors being associated with speech units. The second set is modified using the first set, such that the modified second set represents a suppression of the output of the speaker device in the second signal. A transcription of the utterance of the speaker can be generated from the modified second set of vectors.

    摘要翻译: 本文描述的技术可以以包括接收表示扬声器装置的输出的第一信号和包括扬声器装置的输出的第二信号以及对应于扬声器发声的音频信号的方法来实现。 该方法包括将第一信号的一个或多个段对准第二信号的一个或多个段。 第一和第二信号的一个或多个段的声学特征被分类以分别获得与语音单元相关联的向量的第一组向量和第二组向量。 使用第一组修改第二组,使得修改的第二组表示抑制第二信号中的扬声器设备的输出。 可以从修改的第二组向量生成说话者的话语的转录。

    Dynamic display of content consumption by geographic location

    公开(公告)号:US10242029B2

    公开(公告)日:2019-03-26

    申请号:US14981733

    申请日:2015-12-28

    申请人: Google Inc.

    IPC分类号: G06F17/30 G09B29/00 G06Q10/06

    摘要: This disclosure relates to dynamic display of content consumption by geographic location. A processor recognizes content being consumed by a set of users, and identifies geographic locations of the consumption and a set of characteristics associated with the consumption. The processor further determines at least one filter for a user of the set of users and filters the set of consumption characteristics based on the at least one filter.The processor further ranks respective consumed content based on a filtered set of consumption characteristics, and displays to the user subsets of the consumed content according to respective rankings and geographic location.