摘要:
Techniques for re-associating dynamic metadata with media data are provided. A media processing system creates, with a first media processing stage, binding information comprising dynamic metadata and a time relationship between the dynamic metadata and media data. The binding information may be derived from the media data. While the first media processing stage delivers the media data to a second media processing stage in a first data path, the first media processing stage passes the binding information to the second media processing stage in a second data path. The media processing system re-associates, with the second media processing stage, the dynamic metadata and the media data using the binding information.
摘要:
Techniques for ranking representative segments in media data are provided. Media features of many different types may be extracted from the media data. A plurality of ranking scores may be assigned to a plurality of candidate representative segments. Each individual candidate representative segment in the plurality of candidate representative segments comprises at least one scene in one or more statistical patterns in media features of the media data based on one or more types of features extractable from the media data. Each individual ranking score in the plurality of ranking scores may be assigned to an individual candidate representative segment in the plurality of candidate representative segments. A representative segment to be played to an end user may be selected from the candidate representative segments, based on the plurality of ranking scores.
摘要:
Signatures that can be used to identify video and audio content are generated from the content by generating measures of dissimilarity between features of corresponding groups of pixels in frames of video content and by generating low-resolution time-frequency representations of audio segments. The signatures are generated by applying a hash function to intermediate values derived from the measures of dissimilarity and to the low-resolution time-frequency representations. The generated signatures may be used in a variety of applications such as restoring synchronization between video and audio content streams and identifying copies of original video and audio content. The generated signatures can provide reliable identifications despite intentional and unintentional modifications to the content.
摘要:
Metadata comprising a set of gain values for creating a dominance effect is automatically generated. Automatically generating the metadata includes receiving multiple audio streams and a dominance criterion for at least one of the audio streams. A set of gains is computed for one or more audio streams based on the dominance criterion for the at least one audio stream and metadata is generated with the set of gains.
摘要:
Robust media fingerprints are derived from a portion of audio content. A portion of content in an audio signal is categorized. The audio content is characterized based, at least in part, on one or more of its features. The features may include a component that relates to one of several sound categories, e.g., speech and/or noise, which may be mixed with the audio signal. Upon categorizing the audio content as free of the speech or noise related components, the audio signal component is processed. Upon categorizing the audio content as including the speech related component and/or the noise related components, the speech or noise related components are separated from the audio signal. The audio signal is processed independent of the speech related component and/or the noise related component. Processing the audio signal includes computing the audio fingerprint, which ably corresponds to the audio signal.
摘要:
A signature that can be used to identify video content in a series of video frames is generated by first calculating the average and variance of picture elements in a low-resolution composite image that represents a temporal and spatial composite of the video content in the series of frames. The signature is generated by applying a hash function to values derived from the average and variance composite representations. The video content of a signal can be represented by a set of signatures that are generated for multiple series of frames within the signal. A set of signatures can provide reliable identifications despite intentional and unintentional modifications to the content.
摘要:
A method detects events in multimedia. Features are extracted from the multimedia. The features are sampled using a sliding window to obtain samples. A context model is constructed for each sample. An affinity matrix is determined from the models and a commutative distance metric between each pair of context models. A second generation eigenvector is determined for the affinity matrix, and the samples are then clustered into events according to the second generation eigenvector.
摘要:
A method presents a video according to compositional structures associated with the video. Each compositional structure has a label, and multiple segments that can be organized temporally or hierarchically. A particular compositional structure is selected with a remote controller, and the video is presented by a playback controller on a display device according to the compositional structure.
摘要:
A method uses probabilistic fusion to detect highlights in videos using both audio and visual information. Specifically, the method uses coupled hidden Markov models (CHMMs). Audio labels are generated using audio classification via Gaussian mixture models (GMMs), and visual labels are generated by quantizing average motion vector magnitudes. Highlights are modeled using discrete-observation CHMMs trained with labeled videos. The CHMMs have better performance than conventional hidden Markov models (HMMs) trained only on audio signals, or only on video frames.
摘要:
Techniques for adaptive processing of media data based on separate data specifying a state of the media data are provided. A device in a media processing chain may determine whether a type of media processing has already been performed on an input version of media data. If so, the device may adapt its processing of the media data to disable performing the type of media processing. If not, the device performs the type of media processing. The device may create a state of the media data specifying the type of media processing. The device may communicate the state of the media data and an output version of the media data to a recipient device in the media processing chain, for the purpose of supporting the recipient device's adaptive processing of the media data.