Abstract:
Systems and techniques for adding pitch shift resistance to an audio fingerprint are presented. In particular, an audio track for a media file is received. A first audio fingerprint for the audio track with a first pitch shift and an Nth audio fingerprint for the audio track with an Mth pitch shift are generated, where N is an integer greater than or equal to two and M is an integer greater than or equal to two. A combined audio fingerprint is generated from at least the first audio fingerprint and the Nth audio fingerprint.
Abstract:
Systems and methods are provided for suggesting actions for selected text based on content displayed on a mobile device. An example method can include converting a selection made via a display device into a query, providing the query to an action suggestion model that is trained to predict an action given a query, each action being associated with a mobile application, receiving one or more predicted actions, and initiating display of the one or more predicted actions on the display device. Another example method can include identifying, from search records, queries where a website is highly ranked, the website being one of a plurality of websites in a mapping of websites to mobile applications. The method can also include generating positive training examples for an action suggestion model from the identified queries, and training the action suggestion model using the positive training examples.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Abstract:
A matching system receives probe audio samples for comparison to references of a data store. Comparisons are generated between a first segment of a probe audio sample and corresponding time segments of a plurality of reference audio samples to identify a plurality of sufficiently matching reference audio samples based upon a first set of consistency scores. Matching references are retained, unless they meet a score threshold. Comparisons are continually generated with a second segment of the probe audio sample and corresponding time segments of the sufficiently matching reference audio samples to generate a second set of consistency scores. The retained results are outputted based on the first and second set of consistency scores.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, receiving audio data; determining that an initial portion of the audio data corresponds to an initial portion of a hotword; in response to determining that the initial portion of the audio data corresponds to the initial portion of the hotword, selecting, from among a set of one or more actions that are performed when the entire hotword is detected, a subset of the one or more actions; and causing one or more actions of the subset to be performed.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Abstract:
This disclosure relates to transformation invariant media matching. A fingerprinting component can generate a transformation invariant identifier for media content by adaptively encoding the relative ordering of interest points in media content. The interest points can be grouped into subsets, and stretch invariant descriptors can be generated for the subsets based on ratios of coordinates of interest points included in the subsets. The stretch invariant descriptors can be aggregated into a transformation invariant identifier. An identification component compares the identifier against a set of identifiers for known media content, and the media content can be matched or identified as a function of the comparison.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining audio data corresponding to an utterance; transmitting the audio data corresponding to the utterance; receiving an indication that that utterance likely includes a communication-related voice command; in response to receiving the indication that the utterance likely includes the communication-related voice command, applying at least a language model to a representation of the audio data corresponding to the utterance, to identify data referencing a contact; and transmitting the data referencing the contact.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for designating certain voice commands as hotwords. The methods, systems, and apparatus include actions of receiving a hotword followed by a voice command. Additional actions include determining that the voice command satisfies one or more predetermined criteria associated with designating the voice command as a hotword, where a voice command that is designated as a hotword is treated as a voice input regardless of whether the voice command is preceded by another hotword. Further actions include, in response to determining that the voice command satisfies one or more predetermined criteria associated with designating the voice command as a hotword, designating the voice command as a hotword.