Melody recognition systems
    61.
    发明授权
    Melody recognition systems 有权
    旋律识别系统

    公开(公告)号:US09569532B1

    公开(公告)日:2017-02-14

    申请号:US14300600

    申请日:2014-06-10

    申请人: Google Inc.

    IPC分类号: H04N5/92 G06F17/30

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于从视频集合中选择一组候选视频,所述一组候选视频被识别为与特定歌曲相关联,以及(ii) 被列为无伴奏视频录像; 从所述候选视频的音频频道中提取来自所述组的每个候选视频的单声道旋律线; 基于所述子集的候选视频的单声道旋律线的相似度,从所述一组候选视频中选择所述候选视频的子集; 以及提供识别器,其识别由人类声音产生的声音的歌曲,(i)特定歌曲的标识符,以及(ii)该子集的候选视频的一个或多个单声道旋律线。

    COLLABORATIVE LANGUAGE MODEL BIASING
    62.
    发明申请
    COLLABORATIVE LANGUAGE MODEL BIASING 有权
    协同语言模型偏差

    公开(公告)号:US20170032781A1

    公开(公告)日:2017-02-02

    申请号:US14811190

    申请日:2015-07-28

    申请人: Google Inc.

    IPC分类号: G10L15/197 G10L15/24

    摘要: Methods, including computer programs encoded on a computer storage medium, for collaborative language model biasing. In one aspect, a method includes receiving (i) data including a set of terms associated with a target user, and, (ii) from each of multiple other users, data including a set of terms associated with the other user, selecting a particular other user based at least on comparing the set of terms associated with the target user to the sets of terms associated with the other users, selecting one or more terms from the set of terms that is associated with the particular other user, obtaining, based on the selected terms that are associated with the particular other user, a biased language model, and providing the biased language model to an automated speech recognizer.

    摘要翻译: 方法,包括在计算机存储介质上编码的计算机程序,用于协作语言模型偏移。 在一个方面,一种方法包括接收(i)包括与目标用户相关联的一组术语的数据,以及(ii)来自多个其他用户中的每一个的数据包括与其他用户相关联的一组术语,选择特定的 至少基于将与目标用户相关联的术语集合与与其他用户相关联的术语集进行比较来选择其他用户,从与特定其他用户相关联的术语集中选择一个或多个术语,基于 与特定的其他用户相关联的所选术语,有偏差的语言模型,以及向偏移语言识别器提供有偏差的语言模型。

    Associating audio tracks with video content
    63.
    发明授权
    Associating audio tracks with video content 有权
    将音轨与视频内容相关联

    公开(公告)号:US09542488B2

    公开(公告)日:2017-01-10

    申请号:US13957944

    申请日:2013-08-02

    申请人: Google Inc.

    发明人: Matthew Sharifi

    摘要: In one example, a system comprises at least one processor configured to determine an indication of an audio portion of video content, determine, based at least in part on the indication, one or more candidate audio tracks, determine, based at least in part on the one or more candidate audio tracks, one or more search terms, and provide a search query that includes the search terms. The at least one processor may be further configured to, in response to the search query, receive a response that indicates a number of search results, wherein each one of the search results is associated with content that includes the one or more search terms, select, based at least in part on the response, a particular audio track of the one or more candidate audio tracks, and send a message that associates the video content with at least the particular audio track.

    摘要翻译: 在一个示例中,系统包括被配置为确定视频内容的音频部分的指示的至少一个处理器,至少部分地基于指示确定一个或多个候选音轨,至少部分地基于 一个或多个候选音轨,一个或多个搜索词,并提供包括搜索词的搜索查询。 所述至少一个处理器还可以被配置为响应于搜索查询而接收指示搜索结果的数量的响应,其中每个搜索结果与包括一个或多个搜索项的内容相关联,选择 至少部分地基于所述响应,所述一个或多个候选音频轨道的特定音轨,并且发送将所述视频内容与至少所述特定音频轨道相关联的消息。

    Detection of inactive broadcasts during live stream ingestion
    64.
    发明授权
    Detection of inactive broadcasts during live stream ingestion 有权
    在实况流摄入过程中检测无效广播

    公开(公告)号:US09536151B1

    公开(公告)日:2017-01-03

    申请号:US14581789

    申请日:2014-12-23

    申请人: Google Inc.

    IPC分类号: G06K9/00 G10L19/018

    摘要: Systems and methods are provided herein relating to real-time detection of inactive broadcasts during live stream ingestion. Both audio fingerprints and video fingerprints can be dynamically and continuously generated for a live stream ingestion. Sets of video fingerprints and sets of audio fingerprints can be continuously generated based on common successive overlapping time windows. A set of audio fingerprints and a set of video fingerprints can be associated with each time window. Video similarity scores and audio similarity scores can be generates for each time window to determine whether the stream is inactive or static during the time window. Only fingerprints relating to an active broadcast can be indexed in a fingerprint index.

    摘要翻译: 本文提供的系统和方法涉及在实时流摄取期间实时检测无效广播。 可以动态连续地生成音频指纹和视频指纹,用于实况流摄取。 可以基于共同的连续重叠时间窗口连续地生成视频指纹集和音频指纹集。 一组音频指纹和一组视频指纹可以与每个时间窗口相关联。 可以为每个时间窗口生成视频相似度分数和音频相似性分数,以在时间窗口中确定流是不活动还是静态。 只有与主动广播相关的指纹才能在指纹索引中索引。

    Hold back and real time ranking of results in a streaming matching system
    65.
    发明授权
    Hold back and real time ranking of results in a streaming matching system 有权
    在流媒体匹配系统中保持结果的实时排名

    公开(公告)号:US09529907B2

    公开(公告)日:2016-12-27

    申请号:US13732108

    申请日:2012-12-31

    申请人: Google Inc.

    IPC分类号: G06F17/30 G10L25/54

    摘要: A matching system receives probe audio samples for comparison to references of a data store. Comparisons are generated to determine a sufficient match for a portion or a first amount of the probe sample. Ranking scores are assigned to the resulting match references. The match references are retained, unless meeting a score threshold. Comparisons are continually generated with second amounts of the probe sample and the retained references are updated with further matching references assigned ranking scores. The retained results are merged and determined to satisfy a score threshold for release as outputted results for matching references.

    摘要翻译: 匹配系统接收探针音频样本,以便与数据存储的引用进行比较。 生成比较以确定探针样品的一部分或第一量的足够的匹配。 排名得分被分配给结果匹配引用。 匹配引用被保留,除非满足分数阈值。 使用第二量的探针样品不断产生比较,并且使用进一步的匹配参考指定排名分数来更新保留的参考。 保留的结果被合并并确定为满足用于匹配引用的输出结果的释放分数阈值。

    Method for Siren Detection Based on Audio Samples
    66.
    发明申请
    Method for Siren Detection Based on Audio Samples 有权
    基于音频样本的警笛检测方法

    公开(公告)号:US20160155452A1

    公开(公告)日:2016-06-02

    申请号:US15004232

    申请日:2016-01-22

    申请人: Google Inc.

    摘要: The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first.

    摘要翻译: 本公开提供使装置能够从短音频样本识别声音的方法和装置。 该设备可以捕获音频样本并创建不同长度的多个音频信号,每个音频信号包含来自捕获的音频样本的音频。 该设备我的处理多个音频信号以尝试识别音频信号的特征,其指示捕获的声音的识别。 因为可以更快地分析较短的音频样本,所以系统可以首先处理最短的音频样本,以便快速识别音频信号的特征。 因为较长的音频样本包含更多信息,所以系统可能能够更准确地识别较长音频样本中的音频信号中的特征。 然而,分析更长的音频信号比识别较短信号中的特征需要更多的缓冲音频。 因此,本系统首先尝试识别最短音频信号中的特征。

    PROTECTING CONTENT ON A MOBILE DEVICE FROM MINING
    67.
    发明申请
    PROTECTING CONTENT ON A MOBILE DEVICE FROM MINING 有权
    保护移动设备的内容从采矿

    公开(公告)号:US20160063660A1

    公开(公告)日:2016-03-03

    申请号:US14570496

    申请日:2014-12-15

    申请人: GOOGLE INC.

    IPC分类号: G06T1/00

    摘要: Systems and methods prevent or restrict the mining of content on a mobile device. For example, a method may include determining that content to be displayed on a screen includes content that matches a mining-restriction trigger, inserting a mining-restriction mark in the content that protects at least a portion of the content, and displaying the content with the mining-restriction mark on the screen. As another example, a method may include identifying, by a first application running on a mobile device, a mining-restriction mark in frame buffer data, the mining-restriction mark having been inserted by a second application, and determining whether the mining-restriction mark prevents mining of content. The method may also include preventing mining when the mining-restriction mark prevents mining and, when the mining-restriction mark does not prevent mining, determining a restriction for the data based on the mining-restriction mark and providing the restriction with the data for further processing.

    摘要翻译: 系统和方法防止或限制在移动设备上挖掘内容。 例如,一种方法可以包括确定要在屏幕上显示的内容包括与挖掘限制触发相匹配的内容,在保护内容的至少一部分的内容中插入采矿限制标记,以及显示内容, 屏幕上的采矿限制标记。 作为另一示例,方法可以包括通过在移动设备上运行的第一应用来识别帧缓冲器数据中的采矿限制标记,所述采矿限制标记已经由第二应用插入,并且确定是否采矿限制 标记防止内容挖掘。 该方法还可以包括当采矿限制标记防止采矿时防止采矿,并且当采矿限制标记不阻止采矿时,基于采矿限制标记确定对数据的限制并且为进一步的数据提供限制 处理。

    Method for siren detection based on audio samples
    68.
    发明授权
    Method for siren detection based on audio samples 有权
    基于音频样本的警笛检测方法

    公开(公告)号:US09275136B1

    公开(公告)日:2016-03-01

    申请号:US14095199

    申请日:2013-12-03

    申请人: Google Inc.

    IPC分类号: G08G1/00 G06F17/30 H04R29/00

    摘要: The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first.

    摘要翻译: 本公开提供使装置能够从短音频样本识别声音的方法和装置。 该设备可以捕获音频样本并创建不同长度的多个音频信号,每个音频信号包含来自捕获的音频样本的音频。 该设备我的处理多个音频信号以尝试识别音频信号的特征,其指示捕获的声音的识别。 因为可以更快地分析较短的音频样本,所以系统可以首先处理最短的音频样本,以便快速识别音频信号的特征。 因为较长的音频样本包含更多信息,所以系统可能能够更准确地识别较长音频样本中的音频信号中的特征。 然而,分析更长的音频信号比识别较短信号中的特征需要更多的缓冲音频。 因此,本系统首先尝试识别最短音频信号中的特征。

    TEXT-DEPENDENT SPEAKER IDENTIFICATION
    69.
    发明申请
    TEXT-DEPENDENT SPEAKER IDENTIFICATION 有权
    文本依赖性扬声器识别

    公开(公告)号:US20150294670A1

    公开(公告)日:2015-10-15

    申请号:US14612830

    申请日:2015-02-03

    申请人: Google Inc.

    IPC分类号: G10L17/18

    CPC分类号: G10L17/18 G10L17/005

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speaker verification. The methods, systems, and apparatus include actions of inputting speech data that corresponds to a particular utterance to a first neural network and determining an evaluation vector based on output at a hidden layer of the first neural network. Additional actions include obtaining a reference vector that corresponds to a past utterance of a particular speaker. Further actions include inputting the evaluation vector and the reference vector to a second neural network that is trained on a set of labeled pairs of feature vectors to identify whether speakers associated with the labeled pairs of feature vectors are the same speaker. More actions include determining, based on an output of the second neural network, whether the particular utterance was likely spoken by the particular speaker.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的用于说话者验证的计算机程序。 方法,系统和装置包括将对应于特定话语的语音数据输入到第一神经网络并基于第一神经网络的隐藏层处的输出来确定评估向量的动作。 附加动作包括获得对应于特定说话者的过去话语的参考矢量。 进一步的动作包括将评估向量和参考矢量输入到第二神经网络,该第二神经网络被训练在一组标记的特征矢量对上,以识别与标记的特征矢量对相关联的扬声器是否是相同的扬声器。 更多的动作包括基于第二神经网络的输出确定特定话语是否可能由特定说话者说出。

    System and method for adding pitch shift resistance to an audio fingerprint
    70.
    发明授权
    System and method for adding pitch shift resistance to an audio fingerprint 有权
    为音频指纹添加音高变化阻力的系统和方法

    公开(公告)号:US09159327B1

    公开(公告)日:2015-10-13

    申请号:US13723034

    申请日:2012-12-20

    申请人: Google Inc.

    IPC分类号: G06F17/30 G10L19/018

    摘要: Systems and techniques for adding pitch shift resistance to an audio fingerprint are presented. In particular, an audio track for a media file is received. A first audio fingerprint for the audio track with a first pitch shift and an Nth audio fingerprint for the audio track with an Mth pitch shift are generated, where N is an integer greater than or equal to two and M is an integer greater than or equal to two. A combined audio fingerprint is generated from at least the first audio fingerprint and the Nth audio fingerprint.

    摘要翻译: 介绍了增加音高转换电阻到音频指纹的系统和技术。 特别地,接收用于媒体文件的音轨。 产生具有第一间距移位的音轨的第一音频指纹和具有第M音调移位的音轨的第N个音频指纹,其中N是大于或等于2的整数,M是大于或等于的整数 到两个。 从至少第一音频指纹和第N音频指纹生成组合音频指纹。