摘要:
Derivation of a fingerprint includes generating feature matrices based on one or more training images, generating projection matrices based on the feature matrices in a training process, and deriving a fingerprint for one or more images by, at least in part, projecting a feature matrix based on the one or more images onto the projection matrices generated in the training process.
摘要:
Techniques for re-associating dynamic metadata with media data are provided. A media processing system creates, with a first media processing stage, binding information comprising dynamic metadata and a time relationship between the dynamic metadata and media data. The binding information may be derived from the media data. While the first media processing stage delivers the media data to a second media processing stage in a first data path, the first media processing stage passes the binding information to the second media processing stage in a second data path. The media processing system re-associates, with the second media processing stage, the dynamic metadata and the media data using the binding information.
摘要:
Techniques for ranking representative segments in media data are provided. Media features of many different types may be extracted from the media data. A plurality of ranking scores may be assigned to a plurality of candidate representative segments. Each individual candidate representative segment in the plurality of candidate representative segments comprises at least one scene in one or more statistical patterns in media features of the media data based on one or more types of features extractable from the media data. Each individual ranking score in the plurality of ranking scores may be assigned to an individual candidate representative segment in the plurality of candidate representative segments. A representative segment to be played to an end user may be selected from the candidate representative segments, based on the plurality of ranking scores.
摘要:
Signatures that can be used to identify video and audio content are generated from the content by generating measures of dissimilarity between features of corresponding groups of pixels in frames of video content and by generating low-resolution time-frequency representations of audio segments. The signatures are generated by applying a hash function to intermediate values derived from the measures of dissimilarity and to the low-resolution time-frequency representations. The generated signatures may be used in a variety of applications such as restoring synchronization between video and audio content streams and identifying copies of original video and audio content. The generated signatures can provide reliable identifications despite intentional and unintentional modifications to the content.
摘要:
Metadata comprising a set of gain values for creating a dominance effect is automatically generated. Automatically generating the metadata includes receiving multiple audio streams and a dominance criterion for at least one of the audio streams. A set of gains is computed for one or more audio streams based on the dominance criterion for the at least one audio stream and metadata is generated with the set of gains.
摘要:
Robust media fingerprints are derived from a portion of audio content. A portion of content in an audio signal is categorized. The audio content is characterized based, at least in part, on one or more of its features. The features may include a component that relates to one of several sound categories, e.g., speech and/or noise, which may be mixed with the audio signal. Upon categorizing the audio content as free of the speech or noise related components, the audio signal component is processed. Upon categorizing the audio content as including the speech related component and/or the noise related components, the speech or noise related components are separated from the audio signal. The audio signal is processed independent of the speech related component and/or the noise related component. Processing the audio signal includes computing the audio fingerprint, which ably corresponds to the audio signal.
摘要:
A signature that can be used to identify video content in a series of video frames is generated by first calculating the average and variance of picture elements in a low-resolution composite image that represents a temporal and spatial composite of the video content in the series of frames. The signature is generated by applying a hash function to values derived from the average and variance composite representations. The video content of a signal can be represented by a set of signatures that are generated for multiple series of frames within the signal. A set of signatures can provide reliable identifications despite intentional and unintentional modifications to the content.
摘要:
Techniques for adaptive processing of media data based on separate data specifying a state of the media data are provided. A device in a media processing chain may determine whether a type of media processing has already been performed on an input version of media data. If so, the device may adapt its processing of the media data to disable performing the type of media processing. If not, the device performs the type of media processing. The device may create a state of the media data specifying the type of media processing. The device may communicate the state of the media data and an output version of the media data to a recipient device in the media processing chain, for the purpose of supporting the recipient device's adaptive processing of the media data.
摘要:
In a class of embodiments, an audio encoding system (typically, a perceptual encoding system that is configured to generate a single (“unified”) bitstream that is compatible with (i.e., decodable by) a first decoder configured to decode audio data encoded in accordance with a first encoding protocol (e.g., the multichannel Dolby Digital Plus, or DD+, protocol) and a second decoder configured to decode audio data encoded in accordance with a second encoding protocol (e.g., the stereo AAC, HE AAC v1, or HE AAC v2 protocol). The unified bitstream can include both encoded data (e.g., bursts of data) decodable by the first decoder (and ignored by the second decoder) and encoded data (e.g., other bursts of data) decodable by the second decoder (and ignored by the first decoder). In effect, the second encoding format is hidden within the unified bitstream when the bitstream is decoded by the first decoder, and the first encoding format is hidden within the unified bitstream when the bitstream is decoded by the second decoder. The format of the unified bitstream generated in accordance with the invention may eliminate the need for transcoding elements throughout an entire media chain and/or ecosystem. Other aspects of the invention are an encoding method performed by any embodiment of the inventive encoder, a decoding method performed by any embodiment of the inventive decoder, and a computer readable medium (e.g., disc) which stores code for implementing any embodiment of the inventive method.
摘要:
Techniques for ranking representative segments in media data are provided. Media features of many different types may be extracted from the media data. A plurality of ranking scores may be assigned to a plurality of candidate representative segments. Each individual candidate representative segment in the plurality of candidate representative segments comprises at least one scene in one or more statistical patterns in media features of the media data based on one or more types of features extractable from the media data. Each individual ranking score in the plurality of ranking scores may be assigned to an individual candidate representative segment in the plurality of candidate representative segments. A representative segment to be played to an end user may be selected from the candidate representative segments, based on the plurality of ranking scores.