Abstract:
A method of extracting audio excerpts comprises: segmenting audio data into a plurality of audio data segments; setting a fitness criteria for the plurality of audio data segments; analyzing the plurality of audio data segments based on the fitness criteria; and selecting one of the plurality of audio data segments that satisfies the fitness criteria. In various exemplary embodiments, the method of extracting audio excerpts further comprises associating the selected one of the plurality of audio data segments with video data. In such embodiments, associating the selected one of the plurality of audio data segments with video data may comprise associating the selected one of the plurality of audio data segments with a keyframe.
Abstract:
Systems and methods generate a video for virtual reality wherein the video is both panoramic and spatially indexed. In embodiments, a video system includes a controller, a database including spatial data, and a user interface in which a video is rendered in response to a specified action. The video includes a plurality of images retrieved from the database. Each of the images is panoramic and spatially indexed in accordance with a predetermined position along a virtual path in a virtual environment.
Abstract:
A method and apparatus for providing multi-resolution video to multiple users under hybrid human and automatic control. Initial environment and close-up images are captured using a first camera and a PTZ camera. The initial images are then stored in memory. Current environment and close-up images are captured and the an estimated difference between the initial and current images and the true image is determined. The estimated differences are weighted and compared and the stored images are updated. A close-up image is then provided to each user of the system. The close-up camera is then directed to a portion of the environment image having high distortion, and current environment and close-up images are captured again.
Abstract:
Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order. Merged clustered audio intervals corresponding to a single speaker are then used as training data for a speaker segmentation system. Using speaker identification techniques, the full video is then segmented into individual presentations based on the extent of each presenter's speech. The speaker identification system optionally includes the construction of a hidden Markov model trained on the audio data from each slide interval. A Viterbi assignment then segments the audio according to speaker.
Abstract:
Methods and systems for transferring media between media source devices and media sink devices are disclosed. Remote control units are used to indicate the media sink and media source devices for transferring media data between these elements.
Abstract:
This invention relates to a force-feedback apparatus which includes a stylus that is equipped with an electromagnetic device or a freely rotating ball. The stylus is functionally coupled to a controller which is capable of exerting a magnetic field to the electromagnetic device or to the rotating ball, which results in a force being created between the stylus and a surface. This invention also relates to a method of using a force-feedback stylus including moving a force-feedback stylus over a surface, controlling a force-feedback device via a controller coupled to the force-feedback stylus and applying a force to the force-feedback stylus via the force-feedback device, the force being determined for at least features on the surface.
Abstract:
A stream of ordered information, such as, for example, audio, video and/or text data, can be windowed and parameterized. A similarity between the parameterized and windowed stream of ordered information can be determined, and a probabilistic decomposition or probabilistic matrix factorization, such as non-negative matrix factorization, can be applied to the similarity matrix. The component matrices resulting from the decomposition indicate major components or segments of the ordered information. Excerpts can then be extracted from the stream of ordered information based on the component matrices to generate a summary of the stream of ordered information.
Abstract:
Methods and systems for classifying images, such as photographs, allow a user to incorporate subjective judgments regarding photograph qualities when making classification decisions. A slide-show interface allows a user to classify and advance photographs with a one-key action or a single interaction event. The interface presents related information relevant to a displayed photograph that is to be classified, such as contiguous photographs, similar photographs, and other versions of the same photograph. The methods and systems provide an overview interface which allows a user to review and refine classification decisions in the context of the original sequence of photographs.
Abstract:
Methods for interactive selecting video queries consisting of training images from a video for a video similarity search and for displaying the results of the similarity search are disclosed. The user selects a time interval in the video as a query definition of training images for training an image class statistical model. Time intervals can be as short as one frame or consist of disjoint segments or shots. A statistical model of the image class defined by the training images is calculated on-the-fly from feature vectors extracted from transforms of the training images. For each frame in the video, a feature vector is extracted from the transform of the frame, and a similarity measure is calculated using the feature vector and the image class statistical model. The similarity measure is derived from the likelihood of a Gaussian model producing the frame. The similarity is then presented graphically, which allows the time structure of the video to be visualized and browsed. Similarity can be rapidly calculated for other video files as well, which enables content-based retrieval by example. A content-aware video browser featuring interactive similarity measurement is presented. A method for selecting training segments involves mouse click-and-drag operations over a time bar representing the duration of the video; similarity results are displayed as shades in the time bar. Another method involves selecting periodic frames of the video as endpoints for the training segment.
Abstract:
A system in accordance with one embodiment of the present invention comprises a device for facilitating video communication between a remote participant and another location. The device can comprise a screen adapted to display the remote participant, the screen having a posture adapted to be controlled by the remote participant. A camera can be mounted adjacent to the screen, and can allow the subject to view a selected conference participant or a desired location such that when the camera is trained on the selected participant or desired location a gaze of the remote participant displayed by the screen appears substantially directed at the selected participant or desired location.