Abstract:
Methods and systems for transferring media between media source devices and media sink devices are disclosed. Remote control units are used to indicate the media sink and media source devices for transferring media data between these elements.
Abstract:
Techniques segmenting ordered information such as audio, video and text are provided by windowing and parameterizing an ordered information stream and storing of the parameterized and windowed information into a two-dimensional representation such as a matrix. The similarity between the parameter vectors is determined and an orthogonal matrix decomposition such as singular value decomposition is applied to the similarity matrix. The singular values or eigenvalues of the resulting decomposition indicate major components or segments of the ordered information. The boundaries of the major components may be determined using the determined singular vectors to provide, for example, smart cut-and-paste of ordered information in which boundaries are automatically identified by the singular vectors; automatic categorization and retrieval of ordered information and automatic summarization of ordered information.
Abstract:
A method of extracting audio excerpts comprises: segmenting audio data into a plurality of audio data segments; setting a fitness criteria for the plurality of audio data segments; analyzing the plurality of audio data segments based on the fitness criteria; and selecting one of the plurality of audio data segments that satisfies the fitness criteria. In various exemplary embodiments, the method of extracting audio excerpts further comprises associating the selected one of the plurality of audio data segments with video data. In such embodiments, associating the selected one of the plurality of audio data segments with video data may comprise associating the selected one of the plurality of audio data segments with a keyframe.
Abstract:
Systems and methods generate a video for virtual reality wherein the video is both panoramic and spatially indexed. In embodiments, a video system includes a controller, a database including spatial data, and a user interface in which a video is rendered in response to a specified action. The video includes a plurality of images retrieved from the database. Each of the images is panoramic and spatially indexed in accordance with a predetermined position along a virtual path in a virtual environment.
Abstract:
A method and apparatus for providing multi-resolution video to multiple users under hybrid human and automatic control. Initial environment and close-up images are captured using a first camera and a PTZ camera. The initial images are then stored in memory. Current environment and close-up images are captured and the an estimated difference between the initial and current images and the true image is determined. The estimated differences are weighted and compared and the stored images are updated. A close-up image is then provided to each user of the system. The close-up camera is then directed to a portion of the environment image having high distortion, and current environment and close-up images are captured again.
Abstract:
Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order. Merged clustered audio intervals corresponding to a single speaker are then used as training data for a speaker segmentation system. Using speaker identification techniques, the full video is then segmented into individual presentations based on the extent of each presenter's speech. The speaker identification system optionally includes the construction of a hidden Markov model trained on the audio data from each slide interval. A Viterbi assignment then segments the audio according to speaker.
Abstract:
A system and method for detecting useful images and for ranking images in order of usefulness based on a vignette score describing how closely each one resembles a “vignette,” or a central object or image surrounded by a featureless or deemphasized background. Several methods for determining an image's vignette score are disclosed as examples. Variance ratio analysis entails calculation of the ratio of variance between the edge region of the image and the entire image. Statistical model analysis entails developing a statistical classifier capable of determining a statistical model of each image class based on pre-entered training data. Spatial frequency analysis involves estimating the energy at different spatial frequencies in the central and edge regions and in the image as a whole. A vignette score is calculated as the ratio of mid-frequency energies in the edge region to the mid-frequency energies of the entire image.
Abstract:
A system, method and apparatus for remotely annotating an object. An embodiment of the present invention includes a video camera projector that captures video images of a local object and projects annotations made by a user at a remote location onto said local object.
Abstract:
Embodiments of the present invention provide a method for producing a summary of a digital file on one or more computers. The method includes segmenting the digital file into a plurality of segments, clustering said segments into a plurality of clusters and selecting a cluster from said plurality of clusters wherein said selected cluster includes segments representative of said digital file. Upon selection of a cluster a segment of the cluster is provided as a summary of said digital file.
Abstract:
Optimal summaries of a linear media source are automatically produced by parameterizing a linear media source. The parameterized linear media source is used to create a similarity array in which each array element includes the value of a similarity measurement between a two portions of the parameterized media signal. A segment fitness function, adapted for measuring the similarity between a segment of the parameterized media signal and the entire parameterized media signal, is optimized to find an optimal segment location. The portion of the linear media source corresponding to the optimal segment location is selected as the optimal summary. This method produces optimal summaries of any type of linear media, such as video, audio, or text information.