Abstract:
Techniques for classifying video frames using statistical models of transform coefficients are disclosed. After optionally being decimated in time and space, image frames are transformed using a discrete cosine transform or Hadamard transform. The methods disclosed model image composition and operate on grayscale images. The resulting transform matrices are reduced using truncation, principal component analysis, or linear discriminant analysis to produce feature vectors. Feature vectors of training images for image classes are used to compute image class statistical models. Once image class statistical models are derived, individual frames are classified by the maximum likelihood resulting from the image class statistical models. Thus, the probabilities that a feature vector derived from a frame would be produced from each of the image class statistical models are computed. The frame is classified into the image class corresponding to the image class statistical model which produced the highest probability for the feature vector derived from the frame. Optionally, frame sequence information is taken into account by applying a hidden Markov model to represent image class transitions from the previous frame to the current frame. After computing all class probabilities for all frames in the video or sequence of frames using the image class statistical models and the image class transition probabilities, the final class is selected as having the maximum likelihood. Previous frames are selected in reverse order based upon their likelihood given determined current states.
Abstract:
A measure of importance is calculated for segmented parts of a video. The segmented parts are determined by segmenting the video into component shots and then merging by iteration the component shots based on similarity or other factors. Segmentation may also be determined by clustering frames of the video, and creating segments from the same cluster ID. The measure of importance is calculated based on a normalized weight of each segment and on length and rarity of each shot/segmented part. The importance measure may be utilized to generate a video summary by selecting the most important segments and generating representative frames for the selected segments. A thresholding process is applied to the importance score to provide a predetermined number or an appropriate number generated on the fly of shots or segments to be represented by frames. The representative frames are then packed into the video summary. The sizes of the frames to be packed are predetermined by their importance measure and adjusted according to space availability. Packing based on a grid and an exhaustive search of frame combinations to fill each row in the grid. A cost algorithm and a space-filling rule are utilized to determine the best fit of frames. The video summary may be presented on either a paper interface referencing or a web page linking the frames of the summary to points of the video.
Abstract:
A computer system includes a computer processor, an operating system operative in connection with the computer processor, and a display responsive to the operating system. The system also has a pointing device that includes a position sensor and a tactile actuator. A pointing device driver is responsive to the position sensor, and the tactile actuator is responsive to the pointing device driver. A general-purpose application is responsive to the pointing device driver and to the operating system and in communication with the display, and the pointing device driver is also responsive to the general purpose application. The system further includes a profile that maps region changes associated with material displayed on the screen to tactile signals to be sent to the tactile actuator.
Abstract:
A computer system includes a computer processor, an operating system operative in connection with the computer processor, and a display responsive to the operating system. The system also has a pointing device that includes a position sensor and a tactile actuator. A pointing device driver is responsive to the position sensor, and the tactile actuator is responsive to the pointing device driver. A general-purpose application is responsive to the pointing device driver and to the operating system and in communication with the display, and the pointing device driver is also responsive to the general purpose application. The system further includes a profile that maps region changes associated with material displayed on the screen to tactile signals to be sent to the tactile actuator.
Abstract:
Provides a system for detecting an intersection between more than one panoramic video sequence and detecting the orientation of the sequences forming the intersection. Video images and corresponding location data are received. If required, the images and location data is processed to ensure the images contain location data. An intersection between two paths is then derived from the video images by deriving a rough intersection between two images, determining a neighborhood for the two images, and dividing each image in the neighborhood into strips. An identifying value is derived from each strip to create a row of strip values which are then converted to the frequency domain. A distance measure is taken between strips in the frequency domain, and the intersection is determined from the images having the smallest distance measure between them. The orientation between the two paths may also be determined in the frequency domain by using the phases of signals representing the images in the Fourier domain or performing a circular cross correlation of two vectors representing the images.
Abstract:
An audio device management system (ADMS) manages remote audio devices via user selections in video links. The system enhances audio acquisition quality by receiving and processing human suggestions, forming customized two-way audio links according to user requests, and learning audio pickup strategies and camera management strategies from user operations. The ADMS control interface for a remote user provides a multi-window GUI that provides an overview window and selection display window. The ADMS provides users with more flexibility to enhance audio signals according to their needs and makes it more convenient to form customized two-way audio links without requiring users to remember a list of phone numbers. The ADMS also automatically manages available microphones for audio pickup based on microphone sound quality and the system's past experience when users monitor a structured audio environment without explicitly expressing their attentions in the video window.
Abstract:
Video recordings of meetings and scanned paper documents are natural digital documents that come out of a meeting. These can be placed on the Internet for easy access, with links generated between them by matching scanned documents to a segment of the video referencing the scanned document. Furthermore, annotations made on the paper documents during the meeting can be extracted and used as indexes to the video. An orthonormal transform, such as a Digital Cosine Transform (DCT) is used to compare scanned documents to video frames.
Abstract:
A camera array captures plural component images which are combined into a single scene. In one embodiment, each camera of the array is a fixed digital camera. The images from each camera are warped to a common coordinate system and the disparity between overlapping images is reduced using disparity estimation techniques.
Abstract:
A computer system includes a computer processor, an operating system operative in connection with the computer processor, and a display responsive to the operating system. The system also has a pointing device that includes a position sensor and a tactile actuator. A pointing device driver is responsive to the position sensor, and the tactile actuator is responsive to the pointing device driver. A general-purpose application is responsive to the pointing device driver and to the operating system and in communication with the display, and the pointing device driver is also responsive to the general purpose application. The system further includes a profile that maps region changes associated with material displayed on the screen to tactile signals to be sent to the tactile actuator.
Abstract:
Provides a system for detecting an intersection between more than one panoramic video sequence and detecting the orientation of the sequences forming the intersection. Video images and corresponding location data are received. If required, the images and location data is processed to ensure the images contain location data. An intersection between two paths is then derived from the video images by deriving a rough intersection between two images, determining a neighborhood for the two images, and dividing each image in the neighborhood into strips. An identifying value is derived from each strip to create a row of strip values which are then converted to the frequency domain. A distance measure is taken between strips in the frequency domain, and the intersection is determined from the images having the smallest distance measure between them. The orientation between the two paths may also be determined in the frequency domain by using the phases of signals representing the images in the Fourier domain or performing a circular cross correlation of two vectors representing the images.