摘要:
An information processing apparatus includes: an extracting means for extracting a feature volume from a predetermined content; and a computing means for computing an evaluation axis that classifies a first content and a second content by using a first feature volume extracted from the first content by the extracting means or a second feature volume extracted from the second content by the extracting means.
摘要:
According to one embodiment, an electronic apparatus extracts face images of persons from video content data and outputs timestamp information indicating time points at which each extracted face image appears in the video content data, and displays face images in each column of a plurality of face image display areas arranged in a matrix based on the time stamp information. The apparatus detects presence or absence of a face area in each frame consisting of the video content data and decides a cutout range of the detected face area. And, the apparatus adjusts a case in which the cutout range of the decided face area protrudes outside the frame.
摘要:
Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.
摘要:
A perimeter around a detected object in a frame of image data can be generated in a first coordinate system. The perimeter can be converted from the first coordinate system into a second coordinate system having the same aspect ratio as the first coordinate system. A first metadata entry can include dimensions of image data in the second coordinate system. A second metadata entry can provide a location and dimensions of the converted perimeter in the second coordinate space. Additional metadata can indicate matching objects between frames, position of an object relative to other objects in a frame, a probability that an object is correctly detected, and a total number of objects detected across multiple frames of image data.
摘要:
A computer implemented method includes accessing a digital image including a plurality of faces including a first face and a second face. The computer implemented method includes identifying a plurality of identification regions of the digital image including a first identification region associated with the first face and a second identification region associated with the second face. The computer implemented method also includes assigning the digital image to a first face cluster of a plurality of face clusters when a difference between data descriptive of the first identification region and data descriptive of a face cluster identification region of the first face cluster satisfies a threshold. The computer implemented method further includes assigning the digital image to a second face cluster of the plurality of face clusters based at least partially on a probability of the second face and the first face appearing together in an image.
摘要:
The present invention provides an apparatus and method for extracting the content of a video, image, and/or audio file or podcast, analyzing the content, and then providing a targeted advertisement, search capability and/or other functionality based on the content of the file or podcast.
摘要:
A plurality of sets of face images associated with a video is obtained. Each set of face images corresponds to a particular person depicted in the video. Of the people associated with the plurality of sets of face images, one or more of those people are selected to be included in a facial summary by analyzing the plurality of sets of face images and/or the video. For each of the selected one or more people, a face image to use in the facial summary is selected. The facial summary is laid out using the selected face images.
摘要:
A method and apparatus for video retrieval and cueing that automatically detects human faces in the video and identifies face-specific video frames so as to allow retrieval and viewing of person-specific video segments. In one embodiment, the method locates human faces in the video, stores the time stamps associated with each face, displays a single image associated with each face, matches each face against a database, computes face locations with respect to a common 3D coordinate system, and provides a means of displaying: 1) information retrieved from the database associated with a selected person or people, 2) path of travel associated with a selected person or people 3) interaction graph of people in video, 4) video segments associated with each person and/or face. The method may also provide the ability to input and store text annotations associated with each person, face, and video segment, and the ability to enroll and remove people from database. The videos of non-human objects may be processed in a similar manner. Because of the rules governing abstracts, this abstract should not be used to construe the claims.
摘要:
The information processing apparatus according to the present invention is provided with a moving picture analysis unit for analyzing moving picture data including a plurality of images and audios associated with time information and for generating moving picture metadata relating to a plurality of feature quantities characterizing the moving picture, a comic display conversion unit for extracting a plurality of images from the moving picture data based on the moving picture metadata and for dividing a predetermined display region into frames and for converting an arrangement of the plurality of extracted images into a comic-like arrangement and for generating frame information including information about the images arranged in each of the frames, and a comic display data generation unit for generating comic display data including at least the frame information, data of the extracted images, and the audio data of the moving picture.
摘要:
The present disclosure discloses a method for identifying individuals in a multimedia stream originating from a video conferencing terminal or a Multipoint Control Unit, including executing a face detection process on the multimedia stream; defining subsets including facial images of one or more individuals, where the subsets are ranked according to a probability that their respective one or more individuals will appear in a video stream; comparing a detected face to the subsets in consecutive order starting with a most probable subset, until a match is found; and storing an identity of the detected face as searchable metadata in a content database in response to the detected face matching a facial image in one of the subsets.