摘要:
A method for determining a classification for a video segment, comprising the steps of: breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; analyzing the video frames for each short-term video slice to form a plurality of region tracks; analyzing each region track to form a visual feature vector and a motion feature vector; analyzing the audio signal for each short-term video slice to determine an audio feature vector; forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms.
摘要:
A method for determining a classification for a video segment, comprising the steps of: breaking the video segment into a plurality of short-term video slices, each including a plurality of video frames and an audio signal; analyzing the video frames for each short-term video slice to form a plurality of region tracks; analyzing each region track to form a visual feature vector and a motion feature vector; analyzing the audio signal for each short-term video slice to determine an audio feature vector; forming a plurality of short-term audio-visual atoms for each short-term video slice by combining the visual feature vector and the motion feature vector for a particular region track with the corresponding audio feature vector; and using a classifier to determine a classification for the video segment responsive to the short-term audio-visual atoms.
摘要:
A context-based concept fusion method detects a first concept in an image record. The method includes automatically determining at least one other concept in the image record which has a contextual relationship with the first concept and which is to be labeled by a user of the method; and labeling the at least one other concept by the user with a ground truth label to be used in the context-based concept fusion method to improve detection of the first concept in the image record.
摘要:
A context-based concept fusion method detects a first concept in an image record. The method includes automatically determining at least one other concept in the image record which has a contextual relationship with the first concept and which is to be labeled by a user of the method; and labeling the at least one other concept by the user with a ground truth label to be used in the context-based concept fusion method to improve detection of the first concept in the image record.
摘要:
The invention provides a system and method for integrating multimedia descriptions in a way that allows humans, software components or devices to easily identify, represent, manage, retrieve, and categorize the multimedia content. In this manner, a user who may be interested in locating a specific piece of multimedia content from a database, Internet, or broadcast media, for example, may search for and find the multimedia content. In this regard, the invention provides a system and method that receives multimedia content and separates the multimedia content into separate components which are assigned to multimedia categories, such as image, video, audio, synthetic and text. Within each of the multimedia categories, the multimedia content is classified and descriptions of the multimedia content are generated. The descriptions are then formatted, integrated, using a multimedia integration description scheme, and the multimedia integration description is generated for the multimedia content. The multimedia description is then stored into a database. As a result, a user may query a search engine which then retrieves the multimedia content from the database whose integration description matches the query criteria specified by the user. The search engine can then provide the user a useful search result based on the multimedia integration description.
摘要:
A system and method is provided for editing and parsing compressed digital information. The compressed digital information may include visual information which is edited and parsed in the compressed domain. In a preferred embodiment, the present invention provides a method for detecting moving objects in a compressed digital bitstream which represents a sequence of fields or frames of video information for one or more captured scenes of video.
摘要:
Systems and methods for describing image content establish image description records which include an object set (24), an object hierarchy (26) and entity relation graphs (28). For image content, image objects can include global objects (O0 8) and local objects (O1 2 and O2 6). The image objects are further defined by a number of features of different classes (36, 38 and 40), which in turn are further defined by a number of feature descriptors. The relationships between and among the objects in the object set are defined by the object hierarchy (26) and entity relation graphs (28). The image description records provide a standard vehicle for describing the content and context of image information for subsequent access and processing by computer applications such as search engines, filters, and archive systems.
摘要:
Digital watermarks are embedded in image data (102)in order to enable authentication of the image data and/or replacement of rejected portions of the image data. Authentication codes are derived by comparing selected discrete cosine transform (DCT) (104) coefficients within DCT data (106) derived from the original, spatial-domain image data. The authentication codes thus generated are embedded in DCT coefficients (612) other than the ones which were used to derive the authentication codes. The resulting, watermarked data can be sent or made available to one or more recipients who can compress or otherwise use the watermarked data. Image data derived from the watermarked data—e.g, compressed versions of the watermarked data—can be authenticated by: extracting the embedded authentication codes, comparing DCT coefficients derived from the coefficients from which the original authentication codes were generated; and determining whether the compared DCT coefficients are consistent with the extracted authentication codes.
摘要:
Systems and methods for searching a database of media content wherein the user can dynamically and interactively perform searches and navigate search results. One or more search anchors are received, and at least one of the search anchors is associated with an anchor cell on a navigation map. One or more documents assigned to at least one cell on the navigation map can be determined, and the cells are populated with search results based at least in part on the search anchors. At least one of the documents is then displayed to a user.
摘要:
A system and method for labeling and classifying multimedia data is provided that includes novel label propagation techniques and classification function characteristics. The system and method corrects and propagates a small number of potentially erroneous labels to a large amount of multimedia data and generate optimal ways of ranking, classification, and presentation of the data sets. The disclosed systems and methods improve upon prior systems and methods and provide an improved approach to the problems of imbalanced data sets and incorrect label data.