摘要:
A method presents a video according to compositional structures associated with the video. Each compositional structure has a label, and multiple segments that can be organized temporally or hierarchically. A particular compositional structure is selected with a remote controller, and the video is presented by a playback controller on a display device according to the compositional structure.
摘要:
A method generates a summary of a video. Faces are detected in a plurality of frames of the video. The frames are classified according to a number of faces detected in each frame and the video is partitioned into segments according to the classifications to produce a summary of the video. For each frame classified as having a single detected face, one or more characteristics of the face is determined. The frames are labeled according to the characteristics to produce labeled clusters and the segments are partitioned into sub-segments according to the labeled clusters.
摘要:
A method plays frames of a video adaptively according to a visual complexity of the video. First a spatial frequency of pixel within frames of the video is measured, as well as a temporal velocity of corresponding pixels between frames of the video. The spatial frequency is multiplied by the temporal velocity to obtain a measure of the visual complexity of the frames of the video. The frames of the video are then played at a speed that corresponds to the visual complexity.
摘要:
A computer implemented method for determining a vehicle type of a vehicle detected in an image is disclosed. An image having a detected vehicle is received. A number of vehicle models having salient feature points is projected on the detected vehicle. A first set of features derived from each of the salient feature locations of the vehicle models is compared to a second set of features derived from corresponding salient feature locations of the detected vehicle to form a set of positive match scores (p-scores) and a set of negative match scores (n-scores). The detected vehicle is classified as one of the vehicle models models based at least in part on the set of p-scores and the set of n-scores.
摘要:
A computer-implemented method for estimating a volume of at least one food item on a food plate is disclosed. A first and second plurality of images are received from different positions above a food plate, wherein angular spacing between the positions of the first plurality of images is greater than angular spacing between the positions of the second plurality of images. A first set of poses of each of the first plurality of images is estimated. A second set of poses of each of the second plurality of images is estimated based on at least the first set of poses. A pair of images taken from each of the first and second plurality of images is rectified based on at least the first and second set of poses. A 3D point cloud is reconstructed based on at least the rectified pair of images. At least one surface of the at least one food item above the food plate is estimated based on at least the reconstructed 3D point cloud. The volume of the at least one food item is estimated based on the at least one surface.
摘要:
A computer-implemented method for for matching objects is disclosed. At least two images where one of the at least two images has a first target object and a second of the at least two images has a second target object are received. At least one first patch from the first target object and at least one second patch from the second target object are extracted. A distance-based part encoding between each of the at least one first patch and the at least one second patch based upon a corresponding codebook of image parts including at least one of part type and pose is constructed. A viewpoint of one of the at least one first patch is warped to a viewpoint of the at least one second patch. A parts level similarity measure based on the view-invariant distance measure for each of the at least one first patch and the at least one second patch is applied to determine whether the first target object and the second target object are the same or different objects.
摘要:
A computer implemented method for automatically detecting and classifying acoustic signatures across a set of recording conditions is disclosed. A first acoustic signature is received. The first acoustic signature is projected into a space of a minimal set of exemplars of acoustic signature types derived from a larger set of exemplars using a wrapper method. At least one vector distance is calculated between the projected acoustic signature and each exemplar of the minimal set of exemplars. An exemplar is selected from the minimal set of exemplars having the smallest vector distance to the projected acoustic signature as a class corresponding to and classifying the first acoustic signature. The first acoustic signature and the plurality of acoustic signatures may correspond to one of gunshots, musical instruments, songs, and speech. The minimal set of exemplars may correspond to a hierarchy of acoustic signature types.
摘要:
A method detects events in multimedia. Features are extracted from the multimedia. The features are sampled using a sliding window to obtain samples. A context model is constructed for each sample. An affinity matrix is determined from the models and a commutative distance metric between each pair of context models. A second generation eigenvector is determined for the affinity matrix, and the samples are then clustered into events according to the second generation eigenvector.
摘要:
A method segments a video. Audio frames of the video are classified with labels. Dominant labels are assigned to successive time intervals of consecutive labels. A semantic description is constructed for sliding time windows of the successive time intervals, in which the sliding time windows overlap in time, and the semantic description for each time window is a transition matrix determined from the dominant labels of the time intervals. A marker is determined from the transition matrices, in which a frequency of occurrence of the marker is between a low frequency threshold and a high frequency threshold. Then, the video is segmented at the locations of the markers.
摘要:
A system and method summarizes multimedia stored in a compressed multimedia file partitioned into a sequence of segments, where the content of the multimedia is, for example, video signals, audio signals, text, and binary data. An associated metadata file includes index information and an importance level for each segment. The importance information is continuous over as closed interval. An importance level threshold is selected in the closed interval, and only segments of the multimedia having a particular importance level greater than the importance level threshold are reproduced.