摘要:
A portion of media content is accessed. Components from a first and each subsequent spatial regions of the media content are sampled. Each spatial region has an unsegmented area. Each subsequent spatial region includes those within its area as elements thereof or the spatial regions may partially overlap. The regions may overlap independent of a hierarchical relationship between the regions. A media fingerprint is derived from the components of each of the spatial regions, which reliably corresponds to the media content portion, e.g., over geometric attacks such as rotation.
摘要:
Techniques for adaptive processing of media data based on separate data specifying a state of the media data are provided. A device in a media processing chain may determine whether a type of media processing has already been performed on an input version of media data. If so, the device may adapt its processing of the media data to disable performing the type of media processing. If not, the device performs the type of media processing. The device may create a state of the media data specifying the type of media processing. The device may communicate the state of the media data and an output version of the media data to a recipient device in the media processing chain, for the purpose of supporting the recipient device's adaptive processing of the media data.
摘要:
A signature that can be used to identify video content in a series of video frames is generated by first calculating the average and variance of picture elements in a low-resolution composite image that represents a temporal and spatial composite of the video content in the series of frames. The signature is generated by applying a hash function to values derived from the average and variance composite representations. The video content of a signal can be represented by a set of signatures that are generated for multiple series of frames within the signal. A set of signatures can provide reliable identifications despite intentional and unintentional modifications to the content.
摘要:
Features are extracted from video and audio content that have a known temporal relationship with one another. The extracted features are used to generate video and audio signatures, which are assembled with an indication of the temporal relationship into a synchronization signature construct. the construct may be used to calculate synchronization errors between video and audio content received at a remote destination. Measures of confidence are generated at the remote destination to optimize processing and to provide an indication of reliability of the calculated synchronization error.
摘要:
Signatures that can be used to identify video and audio content are generated from the content by generating measures of dissimilarity between features of corresponding groups of pixels in frames of video content and by generating low-resolution time-frequency representations of audio segments. The signatures are generated by applying a hash function to intermediate values derived from the measures of dissimilarity and to the low-resolution time-frequency representations. The generated signatures may be used in a variety of applications such as restoring synchronization between video and audio content streams and identifying copies of original video and audio content. The generated signatures can provide reliable identifications despite intentional and unintentional modifications to the content.
摘要:
A value is computed for a feature in an instance of query content and compared to a threshold value. Based on the comparison, first and second bits in a hash value, which is derived from the query content feature, are determined. Conditional probability values are computed for the likelihood that quantized values of the first and the second bits equal corresponding quantized bit values of a target or reference feature value. The conditional probabilities are compared and a relative strength determined for the first and second bits, which directly corresponds to the conditional probability. The bit with the lowest bit strength is selected as the weakbit. The value of the weakbit is toggled to generate a variation of the query hash value. The query may be extended using the query hash value variation.
摘要:
Techniques for re-associating dynamic metadata with media data are provided. A media processing system creates, with a first media processing stage, binding information comprising dynamic metadata and a time relationship between the dynamic metadata and media data. The binding information may be derived from the media data. While the first media processing stage delivers the media data to a second media processing stage in a first data path, the first media processing stage passes the binding information to the second media processing stage in a second data path. The media processing system re-associates, with the second media processing stage, the dynamic metadata and the media data using the binding information.
摘要:
Content identification and quality monitoring are provided. The method involves obtaining a first fingerprint derived from a first media content, processing the first media content to generate a second media content, obtaining a second fingerprint derived from the second media content, and comparing the first fingerprint and the second fingerprint to determine one or more of: a similarity between the first fingerprint and the second fingerprint that indicates that the second media content is generated from the first media content or a difference between the first fingerprint and the second fingerprint to identify a quality degradation between the first media content and the second media content.
摘要:
Quantized energy values are accessed to initially represent a temporally related group of content elements in a media sequence. The values are accessed over a matrix of regions into which the initial representation is partitioned. The initial representation may be downsampled and/or cropped from the content. A basis vector set is estimated in a dimensional space from the values. The initial representation is transformed into a subsequent representation, which is in another dimensional space. The subsequent representation projects the initial representation, based on the basis vectors. The subsequent representation reliably corresponds to the media content portion over a change in a geometric orientation thereof. Repeated for other media content portions of the group, subsequent representations of the first and other portions are averaged or transformed over time. The averaged/transformed values reliably correspond to the content portion over speed changes. The initial representation may include spatial or transform related information.
摘要:
The invention relates to the coding of audio signals that may include both speech-like and non-speech-like signal components. It describes methods and apparatus for code excited linear prediction (CELP) audio encoding and decoding that employ linear predictive coding (LPC) synthesis filters controlled by LPC parameters, a plurality of codebooks each having codevectors, at least one codebook providing an excitation more appropriate for non-speech-like signals and at least one codebook providing an excitation more appropriate for speech-like signals, and a plurality of gain factors, each associated with a codebook. The encoding methods and apparatus select from the codebooks codevectors and/or associated gain factors by minimizing a measure of the difference between the audio signal and a reconstruction of the audio signal derived from the codebook excitations. The decoding methods and apparatus generate a reconstructed output signal from the LPC parameters, codevectors, and gain factors.