摘要:
A method extracts high-level features from a video including a sequence of frames. Low-level features are extracted from each frame of the video. Each frame of the video is labeled according to the extracted low-level features to generate sequences of labels. Each sequence of labels is associated with one of the extracted low-level feature. The sequences of labels are analyzed using learning machine learning techniques to extract high-level features of the video.
摘要:
A method for transcoding a compressed video partitions the compressed video into hierarchical levels, and extracts features from each of the hierarchical levels. One of a number of conversion modes of a transcoder is selected dependent on the features extracted from the hierarchical levels. The compressed video is then transcoded according to the selected conversion mode.
摘要:
A method for generating a representation of multimedia content by first segmenting the multimedia content spatially and temporally to extract objects. Feature extraction is applied to the objects to produce semantic and syntactic attributes, relations, and a containment set of content entities. The content entities are coded to produce directed acyclic graphs of the content entities, where each directed acyclic graph represents a particular interpretation of the multimedia content. Attributes of each content entity are measured and the measured attributes are assigned to each corresponding content entity in the directed acyclic graphs to rank order the multimedia content.
摘要:
A surveillance and control system includes a feature extraction unit to dynamically extract low-level features from a compressed digital video signal, a description encoder, coupled to the feature extraction unit, to encode the low-level features as content descriptors. An event detector is coupled to the description encoder to detect security events from the content descriptors, and a control signal processor, coupled to the event detector, to generate control signals in response to detecting the security events.
摘要:
A multi-media delivery system for delivering a compressed bitstream through a network to a user device includes a transcoder and a manager. The transcoder is configured to operate on the bit stream using in any one of a plurality of conversion modes. The manager is configured to selecting a particular one of the plurality of conversion modes dependent on semantic content of the bitstream and network characteristics. The system also includes a content classifier to determine the content characteristics, and a model predicator to determine the network characteristics, and user device characteristics. An integrator of the manager generates an optimal rate-quality function to be used for selecting the particular conversion model for a given available bit rate of the network.
摘要:
In an apparatus for transcoding a compressed video, a generator simulates constraints of a network and constraints of a user device. A classifier is coupled to receive an input compressed video and the constraints. The classifier generates content information from features of the input compressed video. A manager produces a plurality of conversions modes dependent the constraints and content information, and a transcoder produces output compressed videos, one for each of the plurality conversion modes.
摘要:
A method for generating a representation of multimedia content by first segmenting the multimedia content spatially and temporally to extract objects. Feature extraction is applied to the objects to produce semantic and syntactic attributes, relations, and a containment set of content entities. The content entities are coded to produce directed acyclic graphs of the content entities, where each directed acyclic graph represents a particular interpretation of the multimedia content.
摘要:
Multiview videos are acquired of a scene with corresponding cameras arranged at poses, such that there is view overlap between any pair of cameras. V-frames are generated from the multiview videos. The V-frames are encoded using only spatial prediction. Then, the V-frames are inserted periodically in an encoded bit stream to provide random temporal access to the multiview videos. Additional view dependency information enables the decoding of a reduced number of frames prior to accessing randomly a target frame for a specified view and time, and decoding the target frame. The method also decodes multiview videos by maintaining a reference picture list for a current frame of a plurality of multiview videos, and predicting each current frame of the plurality of multiview videos according to reference pictures indexed by the associated reference picture list.
摘要:
A method randomly accesses multiview videos. Multiview videos are acquired of a scene with corresponding cameras arranged at poses, such that there is view overlap between any pair of cameras. V-frames are generated from the multiview videos. The V-frames are encoded using only spatial prediction. Then, the V-frames are inserted periodically in an encoded bitstream to provide random temporal access to the multiview videos.
摘要:
A model stored in a memory accessible by a video transcoder includes a first rate-distortion function modeling a requantization of an input video. A second-rate distortion function models a resynchronization marker insertion rate for the transcoded video, and a third rate-distortion function models an intra-block insertion rate for the transcoded video.