Abstract:
An audio recognition service recognizes an audio sample across multiple content types. At least a partial set of results generated by the service are returned to a client while the audio sample is still being recorded and/or transmitted. The client additionally displays the results in real-time or near real-time to the user. The audio sample can be sent over a first HTTP connection and the results can be returned over a second HTTP connection. The audio recognition service further processes check-in selections received from the client for content items indicated by the results. Responsive to receiving the check-in selections, the service determines whether a user is eligible for a reward. If the user is eligible, the service provides the reward.
Abstract:
Systems and methods for media aggregation are disclosed herein. The system includes a media system that can transform media items into one aggregated media item. A synchronization component synchronizes media items with respect to time. The synchronized media items can be analyzed and transformed into an aggregated media item for storage and/or display. In one implementation, the aggregated media item is capable of being displayed in multiple ways to create an enhanced and customizable viewing and/or listening experience.
Abstract:
Systems and methods for audio matching are disclosed herein. In one embodiment, a system includes both interest point mixing and fingerprint mixing by using multiple interest point detection methods in parallel. Since multiple interest point detection methods are used in parallel, accuracy of audio matching is improved across a wide variety of audio signals. In addition the scalability of the disclosed audio matching system is increased by matching the fingerprint of an audio sample with a fingerprint of a reference sample versus matching an entire spectrogram. Accordingly, a more accurate and more general solution to audio matching can be accomplished.
Abstract:
Systems and methods are provided herein relating to real-time detection of inactive broadcasts during live stream ingestion. Both audio fingerprints and video fingerprints can be dynamically and continuously generated for a live stream ingestion. Sets of video fingerprints and sets of audio fingerprints can be continuously generated based on common successive overlapping time windows. A set of audio fingerprints and a set of video fingerprints can be associated with each time window. Video similarity scores and audio similarity scores can be generates for each time window to determine whether the stream is inactive or static during the time window. Only fingerprints relating to an active broadcast can be indexed in a fingerprint index.
Abstract:
System and methods for intelligently pruning interest points are disclosed herein. The systems include generating a plurality of distorted audio samples and associated distorted interest points based upon a clean audio sample. Interest points that are common to sets of distorted interest points are retained with interest points not robust to distortion discarded. The disclosed systems and methods therefore can provide for a scalable audio matching solution by eliminating interest points in reference sample fingerprints. The set of pruned interest points are robust to distortion and the benefits of both scalability and accuracy can be had.
Abstract:
Systems and methods for music recognition and/or tag history synchronization are described. The system includes, for example, a first device, a second device and a server. The first device is configured to record music from a surrounding environment. The first device wirelessly sends the recorded music to the server for identification. The server is configured to identify the recorded music and to generate a tag corresponding to the identified music. The first tag history is updated to include the tag which includes information corresponding to the identified music. The first device and the second device are registered with the server as part of a particular user account. The server is configured to synchronize a second tag history stored in the second device with the updated first tag history.
Abstract:
Systems and methods for facilitating higher confidence matches are provided. In one embodiment, a system includes a memory that stores computer executable components, and a microprocessor that executes the computer executable components stored in the memory. The components can include a metadata matching component that determines a metadata match level between metadata of a plurality of files, and a thresholding component. The thresholding component may compare a metadata threshold with the metadata match level and output a signal configured to cause a decrease in a melody matching strength threshold from a first value to a second value based at least on the metadata match level being greater than the metadata threshold.
Abstract:
Systems and methods are provided herein relating to audio matching. Interest points that are onsets are generally very efficient in audio matching in that they are robust to multiple types of distortion. Prominent onsets can be detected within an audio signal excerpt as interest points and combined as a function of a set of interest points to form a descriptor. Descriptors associated with an audio signal excerpt that contain a set of prominent onsets as interest points can be used in matching the audio signal excerpt to an audio reference. The benefits in generating and using prominent onsets within descriptors improve the accuracy of an audio matching system.
Abstract:
Systems and methods are provided herein relating to audio matching. A compact digest can be generated based on sets of triples, where triples are groupings of three interest points that meet threshold criteria. The compact digest can be used in identifying a potential audio match. A full digest can then be used in verifying the potential match. By using a compact digest to perform audio matching, the audio matching system can be scaled to encompass millions or billions of reference audio samples while still using the full digest to maintain accuracy.
Abstract:
Systems and methods for noise based interest point density pruning are disclosed herein. The systems include determining an amount of noise in an audio sample and adjusting the amount of interest points within an audio sample fingerprint based on the amount of noise. Samples containing high amounts of noise correspondingly generate fingerprints with more interest points. The disclosed systems and methods allow reference fingerprints to be reduced in size while increasing the size of sample fingerprints. The benefits in scalability do not compromise the accuracy of an audio matching system using noise based interest point density pruning.