摘要:
Technologies for generating a boosted tag ranking for a media instance, the boosted tag ranking based on probabilistic relevance estimation and tag correlation refining. Such boosted tag rankings may be used for search result ranking, tag recommendation, and group recommendation.
摘要:
A method and system for object-based video retrieval and indexing include a configuration detection processor for deriving quantitative attribute information for video frames in a compressed video stream. The quantitative attribute information includes object data for a video frame, including the number of objects and their orientation within the video frame and the size, shape, texture, and motion of each object. A configuration comparison processor compares object data from first and second frames to determine differences between first frame video objects and second frame video objects. The configuration comparison processor has a shot boundary detection mode in which it cooperates with a shot boundary detector to identify shot boundaries within a video sequence. In a key frame selection mode, the configuration comparison processor cooperates with a key frame selector to select key frames from the video sequence. A key instance selector communicates with the configuration comparison processor during a key instance selection mode to select key instances of video objects based on differences between first and second instances of video objects. The configuration comparison processor cooperates with a camera operation detector to identify camera operations such as zoom, tracking, and panning within the video sequence. A special effects detector cooperates with the configuration comparison processor to detect special effects video edits such as wipe, dissolve, and fade. The configuration comparison processor and a query match detector enable a user to configure object-based queries and to retrieve video sequences or video frames which include a query video object.
摘要:
A key frame extraction system and method for extracting key frames from a video based on motion analysis of frames within the video. Key frames are highlight frames that are effective in summarizing a video sequence. This allows a user to quickly find a desired spot in a video is long and contains differing subject matter. The key frame extraction system and method uses a triangle model of the motion energy in each frame and extracts key frames based on this model. More specifically, motion analysis is performed on the video frames in order to identify motion acceleration and motion deceleration points within the frames. A triangle model of motion then is constructed based on results of the motion analysis. The apex of the triangle represents a turning point between motion acceleration and motion deceleration. Frames corresponding to this apex are selected as key frames.
摘要:
A key frame extraction system and method for extracting key frames from a video based on motion analysis of frames within the video. Key frames are highlight frames that are effective in summarizing a video sequence. This allows a user to quickly find a desired spot in a video is long and contains differing subject matter. The key frame extraction system and method uses a triangle model of the motion energy in each frame and extracts key frames based on this model. More specifically, motion analysis is performed on the video frames in order to identify motion acceleration and motion deceleration points within the frames. A triangle model of motion then is constructed based on results of the motion analysis. The apex of the triangle represents a turning point between motion acceleration and motion deceleration. Frames corresponding to this apex are selected as key frames.
摘要:
A system and method for real-time multi-view (i.e. not just frontal view) face detection. The system and method uses a sequence of detectors of increasing complexity and face/non-face discriminating thresholds to quickly discard non-faces at the earliest stage possible, thus saving much computation compared to prior art systems. The detector-pyramid architecture for multi-view face detection uses a coarse-to-fine and simple-to-complex scheme. This architecture solves the problem of lengthy processing that precludes real-time face detection effectively and efficiently by discarding most of non-face sub-windows using the simplest possible features at the earliest possible stage. This leads to the first real-time multi-view face detection system which has the accuracy almost as good as the state-of-the-art system yet 270 times faster, allowing real-time performance.
摘要:
An automatic red-eye detection and reduction system is described. The automatic red-eye detection and reduction system includes a red-eye detector that detects if an image contains a red pupil without user intervention. The red-eye detector detects location and size of the red pupil if the image is detected to contain the red pupil. The automatic red-eye detection and reduction system also includes a red-eye reduction system that is coupled to the red-eye detector to change each red color pixel within the red pupil into a predetermined color such that color of the red pupil can be detected and changed without user intervention. A method of automatically detecting and reducing red-eye effect in a digital image is also described.
摘要:
The present invention includes a key frame extraction system and method for extracting key frames from a video based on motion analysis of frames within the video. Key frames are highlight frames that are effective in summarizing a video sequence. This allows a user to quickly find a desired spot in a video is long and contains differing subject matter. The key frame extraction system and method uses a triangle model of the motion energy in each frame and extracts key frames based on this model. More specifically, motion analysis is performed on the video frames in order to identify motion acceleration and motion deceleration points within the frames. A triangle model of motion then is constructed based on results of the motion analysis. The apex of the triangle represents a turning point between motion acceleration and motion deceleration. Frames corresponding to this apex are selected as key frames.
摘要:
A method and system for indexing and retrieving database objects, such as images, include a database manager which initializes database objects based on vectors for values of quantified features associated with the database objects. Similar database objects are grouped into common clusters that are based on system-perceived relationships among the objects. For each search session, a vector for a search query is calculated and database objects from the closest cluster within feature space are selected for presentation at a user device. The user indicates which of the selected objects are relevant to the search session and which of the objects are irrelevant. If one of the clusters includes both relevant and irrelevant objects, the cluster is split into two clusters, so that one of the resulting clusters includes the relevant objects and the other cluster includes irrelevant objects. The correlation matrix is updated to indicate that the resulting clusters have a weak correlation. If two of the clusters include database objects which were indicated to be relevant to the search session, the correlation matrix is updated to indicate that the two clusters have a strong correlation. To avoid an excessive proliferation of database clusters, mergers are performed on clusters which are closely located within the feature space and share a strong correlation within the correlation matrix. Following continued use, the groupings of objects into clusters and the cluster-to-cluster correlations will reflect user-perceived relationships.
摘要:
A system and method for real-time multi-view (i.e. not just frontal view) face detection. The system and method uses a sequence of detectors of increasing complexity and face/non-face discriminating thresholds to quickly discard non-faces at the earliest stage possible, thus saving much computation compared to prior art systems. The detector-pyramid architecture for multi-view face detection uses a coarse-to-fine and simple-to-complex scheme. This architecture solves the problem of lengthy processing that precludes real-time face detection effectively and efficiently by discarding most of non-face sub-windows using the simplest possible features at the earliest possible stage. This leads to the first real-time multi-view face detection system which has the accuracy almost as good as the state-of-the-art system yet 270 times faster, allowing real-time performance.
摘要:
A key frame extraction system and method for extracting key frames from a video based on motion analysis of frames within the video. Key frames are highlight frames that are effective in summarizing a video sequence. This allows a user to quickly find a desired spot in a video is long and contains differing subject matter. The key frame extraction system and method uses a triangle model of the motion energy in each frame and extracts key frames based on this model. More specifically, motion analysis is performed on the video frames in order to identify motion acceleration and motion deceleration points within the frames. A triangle model of motion then is constructed based on results of the motion analysis. The apex of the triangle represents a turning point between motion acceleration and motion deceleration. Frames corresponding to this apex are selected as key frames.