摘要:
A method detects events in multimedia. Features are extracted from the multimedia. The features are sampled using a sliding window to obtain samples. A context model is constructed for each sample. The context models form a time series. An affinity matrix is determined from the time series models and a commutative distance metric between each pair of context models. A second generalized eigenvector is determined for the affinity matrix, and the samples are then clustered into events according to the second generalized eigenvector.
摘要:
When a retrieval condition of an attribute list is input from a user interface unit to a retrieval processing unit, the attribute list stored in an attribute list storing unit is retrieved in the retrieval processing unit. Thereafter, attribute information conforming to the retrieval condition is output to and displayed on a displaying unit. Thereafter, when a retrieval condition of the similarity retrieval is input from the user interface unit to the retrieval processing unit, image data stored in the image information storing unit is retrieved in the retrieval processing unit, and specific image data relating to a characteristic descriptor set conforming to the retrieval condition is selected in the retrieval processing unit. Thereafter, the specific image data is output to and displayed on the displaying unit.
摘要:
Caption boxes which are embedded in video content can be located and the text within the caption boxes decoded. Real time processing is enhanced by locating caption box regions in the compressed video domain and performing pixel based processing operations within the region of the video frame in which a caption box is located. The captions boxes are further refined by identifying word regions within the caption boxes and then applying character and word recognition processing to the identified word regions. Domain based models are used to improve text recognition results. The extracted caption box text can be used to detect events of interest in the video content and a semantic model applied to extract a segment of video of the event of interest.
摘要:
A method for facilitating semantic event classification of a group of image records related to an event. The method using an event detector system for providing: extracting a plurality of visual features from each of the image records; wherein the visual features include segmenting an image record into a number of regions, in which the visual features are extracted; generating a plurality of concept scores for each of the image records using the visual features, wherein each concept score corresponds to a visual concept and each concept score is indicative of a probability that the image record includes the visual concept; generating a feature vector corresponding to the event based on the concept scores of the image records; and supplying the feature vector to an event classifier that identifies at least one semantic event classifier that corresponds to the event.
摘要:
A system and method for semantic event detection in digital image content records is provided in which an event-level “Bag-of-Features” (BOF) representation is used to model events, and generic semantic events are detected in a concept space instead of an original low-level visual feature space based on the BOF representation.
摘要:
A method mines unknown content of a video by first selecting one or more low-level features of the video. For each selected feature, or combination of features, time series data is generated. The time series data is then self-correlated to identify similar segments of the video according to the low-level features. The similar segments are grouped into clusters to discover high-level patterns in the unknown content of video.
摘要:
An apparatus for retrieving a video picture includes a decoder section for decoding a coded bit stream of video picture data representing an arbitrary shape object and including shape information and texture information, a retrieval condition input section for inputting a retrieval condition for retrieval of a desired picture, a retrieval section for retrieving a picture meeting the retrieval condition by using shape information of the object decoded by the decoder section, and a display section for outputting the retrieved result obtained by the retrieval section.
摘要:
A method and concomitant apparatus for comprehensively representing video information in a manner facilitating indexing of the video information. Specifically, a method according to the inveniton comprises the steps of dividing a continuous video stream into a plurality of video scenes; and at least one of the steps of dividing, using intra-scene motion analysis, at least one of the plurality of scenes into one or more layers; representing, as a mosaic, at least one of the pluraliy of scenes; computing, for at least one layer or scene, one or more content-related appearance attributes; and storing, in a database, the content-related appearance attributes or said mosaic representations.
摘要:
Systems and methods of presenting media objects are described. In one aspect, a group of media objects is selected from the collection based upon media object relevance to one or more data structures of a selected media file of indexed, temporally-ordered data structures. One or more of the selected media file and the media objects of the selected group are transmitted to a client for contemporaneous presentation at a selected summarization level. In another aspect, media objects in the collection are grouped into multiple clusters based upon one or more media object relevance criteria. The media object clusters are arranged into a hierarchy of two or more levels. A selected cluster is transmitted to a client for contemporaneous presentation at a selected summarization level.
摘要:
Object-oriented methods and systems for permitting a user to locate one or more video objects from one or more video clips over an interactive network are disclosed. The system includes one or more server computers (110) comprising storage (111) for video clips and databases of video object attributes, a communications network (120), and a client computer (130). The client computer contains a query interface to specify video object attribute information, including motion trajectory information (134), a browser interface to browse through stored video object attributes within the server computers, and an interactive video player.