Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.
Abstract:
Systems and methods are provided for providing insight for entities in mobile onscreen content. For example, a method includes receiving, from a mobile device, an indication of selection of a first entity represented by a visual cue in first annotation data for a screen capture image of a screen of the mobile device and determining entities related to the first entity in a graph-based data store. The method may also include identifying a second entity in the screen capture image that is one of the entities related to the first entity, generating second annotation data, the second annotation data including a visual element linking the first entity and the second entity, and providing the second annotation data for display with the screen on the mobile device.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, receiving audio data; determining that an initial portion of the audio data corresponds to an initial portion of a hotword; in response to determining that the initial portion of the audio data corresponds to the initial portion of the hotword, selecting, from among a set of one or more actions that are performed when the entire hotword is detected, a subset of the one or more actions; and causing one or more actions of the subset to be performed.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speaker identification. In some implementations, data identifying a media item including speech of a speaker is received. Based on the received data, one or more other media items that include speech of the speaker are identified. One or more search results are generated that each reference a respective media item of the one or more other media items that include speech of the speaker. The one or more search results are provided for display.
Abstract:
Systems and methods for noise based interest point density pruning are disclosed herein. The systems include determining an amount of noise in an audio sample and adjusting the amount of interest points within an audio sample fingerprint based on the amount of noise. Samples containing high amounts of noise correspondingly generate fingerprints with more interest points. The disclosed systems and methods allow reference fingerprints to be reduced in size while increasing the size of sample fingerprints. The benefits in scalability do not compromise the accuracy of an audio matching system using noise based interest point density pruning.
Abstract:
Systems and methods prevent or restrict the mining of content on a mobile device. For example, a method may include determining that content to be displayed on a screen includes content that matches a mining-restriction trigger, inserting a mining-restriction mark in the content that protects at least a portion of the content, and displaying the content with the mining-restriction mark on the screen. As another example, a method may include identifying, by a first application running on a mobile device, a mining-restriction mark in frame buffer data, the mining-restriction mark having been inserted by a second application, and determining whether the mining-restriction mark prevents mining of content. The method may also include preventing mining when the mining-restriction mark prevents mining and, when the mining-restriction mark does not prevent mining, determining a restriction for the data based on the mining-restriction mark and providing the restriction with the data for further processing.
Abstract:
Systems and methods are provided herein relating to audio matching. In addition to interest points, localized patches surrounding interest points can be used as additional discriminative information. The patches can be compressed to increase scalability while retaining discriminative information related to the localized region within the patch. Compressed patches related to interest points of an audio sample can be compared to compressed patches related to interest points of a reference sample to determine whether the two samples are a match.
Abstract:
Implementations are provided herein relating to audiovisual matching. Audio and video channel data is merged to create a single multi-channel fingerprint used to match media content. Audio channel data is used to generate audio fingerprints. Video channel data is used to generate a video fingerprints. Multi-channel fingerprints can then be generated based on the audio channel fingerprints and video channel fingerprints. In this sense, entropy can be increased while the multi-channel fingerprint can be less resistant to noise.
Abstract:
This disclosure relates to dynamic display of content consumption by geographic location. A processor recognizes content being consumed by a set of users, and identifies geographic locations of the consumption and a set of characteristics associated with the consumption. The processor further determines at least one filter for a user of the set of users and filters the set of consumption characteristics based on the at least one filter.The processor further ranks respective consumed content based on a filtered set of consumption characteristics, and displays to the user subsets of the consumed content according to respective rankings and geographic location.
Abstract:
A media item fingerprint consolidation system is described that merges fingerprints into a consolidated fingerprint. Fingerprints can be generated to compactly represent media items. Fingerprints of common media items can be merged to generate a consolidated fingerprint that compactly represents the common media items. The consolidated fingerprint can replace existing fingerprints.