Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural language query processing engine, wherein the query includes at least a portion of the transcription and data that identifies the entity, and obtaining one or more results of the query.
Abstract:
Aspects relate to determining whether a probe media content matches one or more reference media content. The reference media content is classified into a content class. The probe media content could also be classified into a content class. Similarities between the probe media content and the reference media content are identified. A matching score given to the probe media content is weighted based on statistics regarding matches and false-positive rates for the content class of the reference media content. Further, classifiers can be trained on computed audio features and video features and/or video metadata and audio metadata of various media content.
Abstract:
Systems and methods are provided herein relating to real-time detection of inactive broadcasts during live stream ingestion. Both audio fingerprints and video fingerprints can be dynamically and continuously generated for a live stream ingestion. Sets of video fingerprints and sets of audio fingerprints can be continuously generated based on common successive overlapping time windows. A set of audio fingerprints and a set of video fingerprints can be associated with each time window. Video similarity scores and audio similarity scores can be generates for each time window to determine whether the stream is inactive or static during the time window. Only fingerprints relating to an active broadcast can be indexed in a fingerprint index.
Abstract:
Systems and techniques for adding pitch shift resistance to an audio fingerprint are presented. In particular, an audio track for a media file is received. A first audio fingerprint for the audio track with a first pitch shift and an Nth audio fingerprint for the audio track with an Mth pitch shift are generated, where N is an integer greater than or equal to two and M is an integer greater than or equal to two. A combined audio fingerprint is generated from at least the first audio fingerprint and the Nth audio fingerprint.
Abstract:
Systems and methods are disclosed for providing device-specific instructions in response to a perception of a media content segment. In one implementation, a processing device captures, at a user device, one or more media content segments. The processing device provides the one or more media content segments to a remote device. The processing device receives one or more instructions, each of the one or more instructions being associated with at least one of the one or more media content segments and corresponding to one or more operations. The processing device initiates execution of at least one of the one or more instructions.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type.
Abstract:
Identifying near identical versions of a probe sample from reference files comprises identifying discriminative regions of reference matches by generating a similarity matrix. The discriminative time frames are communicated to a client device and additional data associated with the probe sample can be retrieved having features of the discriminative regions. Based on the additional data, a single match can be generated to identify the probe sample.
Abstract:
Systems and methods are provided herein relating to speed resistant audio matching. Descriptors can be generated for a received audio signal and matched with reference descriptors. A set of hits for respective reference samples can be generated based on the matching. A histogram can then be generated that correlates probe sample hit time with reference sample hit time. In one implementation, a rolling window can be used in analyzing the histogram allowing for slight variances in the timing between probe sample hits and reference sample hits. In another implementation, the histogram generated can be based on an estimated time stretch of the probe sample. In yet another implementation, a set of histograms can be generated based on a minimum speed change, a maximum speed change, and a speed step. Histograms can be evaluated to determine a most likely matching histogram.
Abstract:
Systems and methods are disclosed for providing device-specific instructions in response to a perception of a media content segment. In one implementation, a processing device receives one or more media content segments from a user device. The processing device processes the one or more media content segments to determine one or more operations associated with the one or more media content segments. The processing device selects, based on one or more characteristics associated with the user device, at least one of the one or more operations. The processing device provides one or more instructions to perform the at least one of the one or more operations in relation to the user device.