Abstract:
For use in a multimedia analysis system capable of analyzing the content of multimedia signals, there is disclosed an apparatus and method for creating a multimedia table of contents of videotaped material. In one advantageous embodiment, the apparatus of the present invention comprises a multimedia table of contents controller that is capable of receiving video signals, audio signals, and text signals of said videotaped material, and capable of combining portions of the video signals, audio signals, and text signals to create a table of contents of the videotaped material. The controller is capable of segmenting video signals with both a coarse and fine segmentation application. The controller is also capable of locating boundaries of elements of the videotaped material with both a coarse and fine boundary detection application. An index module of the controller links elements of the table of contents with combinations of audio, visual, and transcript cues. A retrieval module retrieves and displays a table of contents in response to a user request.
Abstract:
A method and system which enable a user to query a multimedia archive in one media modality and automatically retrieve correlating data in another media modality without the need for manually associating the data items through a data structure. The correlation method finds the maximum correlation between the data items without being affected by the distribution of the data in the respective subspace of each modality. Once the direction of correlation is disclosed, extracted features can be transferred from one subspace to another.
Abstract:
A method and apparatus for editing a source video that has already been taken to stabilize images in the video. To eliminate jerky motion from a video, changes in shots are first detected. Then, any jerkiness within the video of that shot is classified and the video is segmented further into smaller segments based on this classification. The jerkiness within the selected segments is removed. The corrected shot, comprising a plurality of frames, is then added to the preceding shot until all shots of the video have been appropriately corrected for jerkiness. To help the user identify the shots being edited, keyframes or snapshots of the shots are displayed, thereby allowing the user to decide whether processing of the shot is desired and which shots should be incorporated into the final video.
Abstract:
A method, process and system for performing content augmentation of personal profiles includes (a) building a user history of a plurality of augmented content information of relevant TV programs; (b) analyzing user queries and determining a degree to which the user queried for additional content information; (c) inferring values about the user from user queries for additional content information so as to augment the additional content information; (d) updating the augmented content information to at least one of the user history, Internet and specialized databases; (e) linking individual ones of the plurality of augmented content information to each other; and (f) determining inferences about the user's interests and preferences based on the linkage of the plurality of augmented content information. The updating of the augmented content information includes segmenting and indexing of multimedia content. A feedback system is created where user queries for more information and purchases from the Internet and specialized databases will result in additional augmented content information about the particular user.
Abstract:
An interactive imaging system and method thereof are provided. The invention provides a real-time interactive video system. The input signals to the system are given through an array of cameras, sensors and microphones. The input signals are generated by human (or pet) presence and involvement. A set of software modules is invoked based on a nullrulenull framework created by the user, e.g., artist or designer, of the interactive system. The nullrulesnull define which set of input signals are connected to certain portions, i.e., impressible regions, of an image on a display of the system or connected to certain portions on a mosaiced display. The inventive system allows the user to build a set of nullrulesnull as to how the impressible regions of the displayed images can change based on motion, color, and/or texture of a nullvisitornull interacting with the system.
Abstract:
Techniques are disclosed for detecting commercials or other particular types of video content in a video signal. In an illustrative embodiment, color histograms are extracted from frames of the video signal. For each of at least a subset of the extracted color histograms, the extracted color histogram is compared to a family histogram. If the extracted color histogram falls within a specified range of the family histogram, the family histogram is updated to include the extracted color histogram as a new member. If the extracted color histogram does not fall within the specified range of the family histogram, the family histogram is considered complete and the extracted color histogram is utilized to generate a new family histogram for use in processing subsequent extracted color histograms. The resulting family histograms are utilized to detect commercials or other particular type of video content in the video signal.
Abstract:
A visualization system captures and analyzes a video signal to extract features in the video signal to render a graphical multi-dimensional visual representation of the program. The visualization system includes a memory and a processor and is programmed to extract features, augment the feature extraction with supplemental information, and render a visual summary to be displayed on a display device. Using the visual summary, a user can more easily determine the nature of a particular video program.
Abstract:
A video indexing system analyzes contents of source video and develops a visual table of contents using selected images. A system for detecting significant scenes detects video cuts from one scene to another, and static scenes based on DCT coefficients and macroblocks. A keyframe filtering process filters out less desired frames including, for example, unicolor frames, or those frames having a same object as a primary focus or one primary focuses. Commercials may also be detected and frames of commercials eliminated. The significant scenes and static scenes are detected based on a threshold which is set based on the category of the video.
Abstract:
The process of compressing video requires the calculation of a variety data that are used in the process of compression. The invention exploits some or all of these data for purposes of content detection. For example, these data may be leveraged for purposes of commercial detection. The luminance, motion vector field, residual values, quantizer, bit rate, etc. may all be used either directly or in combination, as signatures of content. A process for content detection may employ one or more features as indicators of the start and/or end of a sequence containing a particular type of content and other features as verifiers of the type of content bounded by these start/end indicators. The features may be combined and/or refined to produce higher-level feature data with good computational economy and content-classification utility.