摘要:
Computer implemented method, system and computer usable program code for detecting topic shift boundaries in a multimedia stream. A computer implemented method for detecting topic shift boundaries in a multimedia stream includes receiving a multimedia stream, and performing multimodal analysis on the multimedia stream to locate a plurality of temporal positions within the multimedia stream at which topic changes have an increased likelihood of occurring to provide a sequence of multimedia portions. Characteristics for a sliding window for each multimedia portion in the sequence of multimedia portions are automatically determined, and topic shift boundaries are detected in each multimedia portion by applying a text-based topic shift detector over the media stream's text transcript using a sliding window, wherein the sliding window used with each multimedia portion has the characteristics determined from its respective multimedia portion.
摘要:
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
摘要:
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
摘要:
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
摘要:
System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally adjacent homogeneous segments into a series of semantic units in accordance with the results of both the audio and visual analysis and the keyword extraction. The present invention can be applied to generate important table-of-contents as well as index tables for videos to facilitate efficient video topic searching and browsing.
摘要:
System and method for distinguishing between foreground content and background content in an image presentation. An initial background model is provided, and a final background model is constructed from the initial background model using the image presentation. The foreground content and background content in the image presentation are then distinguished from one another using the final background model. The present invention permits foreground content and background content to be separated from one another for further processing in different types of computer-generated image presentations such as digital slide presentations, video presentations, Web page presentations, and the like.
摘要:
System and method for distinguishing between foreground content and background content in an image presentation. An initial background model is provided, and a final background model is constructed from the initial background model using the image presentation. The foreground content and background content in the image presentation are then distinguished from one another using the final background model. The present invention permits foreground content and background content to be separated from one another for further processing in different types of computer-generated image presentations such as digital slide presentations, video presentations, Web page presentations, and the like.
摘要:
System and method for distinguishing between foreground content and background content in an image presentation. An initial background model is provided, and a final background model is constructed from the initial background model using the image presentation. The foreground content and background content in the image presentation are then distinguished from one another using the final background model. The present invention permits foreground content and background content to be separated from one another for further processing in different types of computer-generated image presentations such as digital slide presentations, video presentations, Web page presentations, and the like.
摘要:
Disclosed is a general framework for extracting semantics from composite media content at various resolutions. Specifically, given a media stream, which may consist of various types of media modalities including audio, visual, text and graphics information, the disclosed framework describes how various types of semantics could be extracted at different levels by exploiting and integrating different media features. The output of this framework is a series of tagged (or annotated) media segments at different scales. Specifically, at the lowest resolution, the media segments are characterized in a more general and broader sense, thus they are identified at a larger scale; while at the highest resolution, the media content is more specifically analyzed, inspected and identified, which thus results in small-scaled media segments.
摘要:
Disclosed is a general framework for extracting semantics from composite media content at various resolutions. Specifically, given a media stream, which may consist of various types of media modalities including audio, visual, text and graphics information, the disclosed framework describes how various types of semantics could be extracted at different levels by exploiting and integrating different media features. The output of this framework is a series of tagged (or annotated) media segments at different scales. Specifically, at the lowest resolution, the media segments are characterized in a more general and broader sense, thus they are identified at a larger scale; while at the highest resolution, the media content is more specifically analyzed, inspected and identified, which thus results in small-scaled media segments.