Abstract:
Implementations are provided herein relating to audiovisual matching. Audio and video channel data is merged to create a single multi-channel fingerprint used to match media content. Audio channel data is used to generate audio fingerprints. Video channel data is used to generate a video fingerprints. Multi-channel fingerprints can then be generated based on the audio channel fingerprints and video channel fingerprints. In this sense, entropy can be increased while the multi-channel fingerprint can be less resistant to noise.
Abstract:
Devices and methods are provided herein relating to video chunking for robust, progressive upload. Video can be parsed to determined byte offsets associated with prospective chunk boundaries. Chunks can be generated based on the prospective chunk boundaries and a preferred chunk size. Sample tables can be generated for each chunk. The chunks can be fully self contained, in that they can be received and transcoded independently of other chunks. Thus, if one chunk fails, only that chunk needs to be retransmitted versus the entire video.
Abstract:
Methods, systems, and apparatus include computer programs encoded on a computer-readable storage medium, including a method for providing content. Snapshots associated with use of a computing device by a user are received. Each snapshot is based on content presented to the user. The snapshots are evaluated. For each respective snapshot, a respective set of entities indicated by the respective snapshot is identified. Indications of the respective set of entities and a respective timestamp indicating a respective time that the respective snapshot was captured are associated and stored. Based on a first snapshot of the snapshots, a first time to present one or more information cards to the user is determined. At the first time, entities having a time stamp that corresponds to the first time are located. An information card is generated based on the located entities. The generated information card is provided for presentation to the user.
Abstract:
Implementations are provided herein relating to audiovisual matching. Audio and video channel data is merged to create a single multi-channel fingerprint used to match media content. Audio channel data is used to generate audio fingerprints. Video channel data is used to generate a video fingerprints. Multi-channel fingerprints can then be generated based on the audio channel fingerprints and video channel fingerprints. In this sense, entropy can be increased while the multi-channel fingerprint can be less resistant to noise.
Abstract:
A computer-implemented method includes obtaining first and second binary vectors. For each of a plurality of vector locations in a first of j words in the first binary vector, the method includes shifting the binary values for the second binary vector so that a particular one of the binary values in the second binary vector is located at a vector location in a first of the k words in the second binary vector that matches the vector location in the first of j words in the first binary vector. For each of the j words in the first binary vector, the method includes aligning the second binary vector with the word in the first binary vector and determining a binary correlation score. A similarity of the first binary vector and the second binary vector can be determined based at least on one or more of the determined binary correlation scores.
Abstract:
Identifying near identical versions of a probe sample from reference files comprises identifying discriminative regions of reference matches by generating a similarity matrix. The discriminative time frames are communicated to a client device and additional data associated with the probe sample can be retrieved having features of the discriminative regions. Based on the additional data, a single match can be generated to identify the probe sample.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding ambient sounds, identifying media content that matches the audio data, and a timestamp corresponding to a particular portion of the identified media content, identifying a speaker associated with the particular portion of the identified media content corresponding to the timestamp, and providing information identifying the speaker associated with the particular portion of the identified media content for output.
Abstract:
Systems and methods are provided herein relating to speed resistant audio matching. Descriptors can be generated for a received audio signal and matched with reference descriptors. A set of hits for respective reference samples can be generated based on the matching. A histogram can then be generated that correlates probe sample hit time with reference sample hit time. In one implementation, a rolling window can be used in analyzing the histogram allowing for slight variances in the timing between probe sample hits and reference sample hits. In another implementation, the histogram generated can be based on an estimated time stretch of the probe sample. In yet another implementation, a set of histograms can be generated based on a minimum speed change, a maximum speed change, and a speed step. Histograms can be evaluated to determine a most likely matching histogram.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.