Abstract:
An improved image retrieval process based on relevance feedback uses a hierarchical (per-feature) approach in comparing images. Multiple query vectors are generated for an initial image by extracting multiple low-level features from the initial image. When determining how closely a particular image in an image collection matches the initial image, a distance is calculated between the query vectors and corresponding low-level feature vectors extracted from the particular image. Once these individual distances are calculated, they are combined to generate an overall distance that represents how closely the two images match. According to other aspects, relevancy feedback received regarding previously retrieved images is used during the query vector generation and the distance determination to influence which images are subsequently retrieved.
Abstract:
A system and method for teleconferencing and recording of meetings. The system uses a variety of capture devices (a novel 360° camera, a whiteboard camera, a presenter view camera, a remote view camera, and a microphone array) to provide a rich experience for people who want to participate in a meeting from a distance. The system is also combined with speaker clustering, spatial indexing, and time compression to provide a rich experience for people who miss a meeting and want to watch it afterward.
Abstract:
Indications of which participant is providing information during a multi-party conference. Each participant has equipment to display information being transferred during the conference. A sourcing signaler residing in the participant equipment provides a signal that indicates the identity of its participant when this participant is providing information to the conference. The source indicators of the other participant equipment receive the signal and cause a UI to indicate that the participant identified by the received signal is providing information (e.g. the UI can causes the identifier to change appearance). An audio discriminator is used to distinguish between an acoustic signal that was generated by a person speaking from that generated in a band-limited manner. The audio discriminator analyzes the spectrum of detected audio signals and generates several parameters from the spectrum and from past determinations to determine the source of an audio signal on a frame-by-frame basis.
Abstract:
An omni-directional camera (a 360 degree camera) is proposed with an integrated microphone array. The primary application for such a camera is videoconferencing and meeting recording, and the device is designed to be placed on a meeting room table. The microphone array is in a planar configuration, and the microphones are located as close to the desktop as possible to eliminate sound reflections from the table. The camera is connected to the microphone array base with a thin cylindrical rod, which is acoustically invisible to the microphone array for the frequency range [50-4000] Hz. This provides a direct path from the person talking to all of the microphones in the array, and can therefore be used for sound source localization (determining the location of the talker) and beam-forming (improving the sound quality of the talker by filtering only sound from a particular direction). The camera array is elevated from the table to provide a near frontal viewpoint of the meeting participants.
Abstract:
Techniques are provided for indicating workspace awareness using one or more of a write shadow, a read shadow, and/or a shadowbar providing an indication of operations performed at associated locations by various users accessing a same document. A write shadow may be used to indicate a position in a document being modified by a user. A read shadow may be used to indicate a position being viewed by a user. A shadowbar may be used to indicate areas of overlap among users with a shading and coloring indicative of a degree of overlap.
Abstract:
Audio/video programming content is made available to a receiver from a content provider, and meta data is made available to the receiver from a meta data provider. The meta data corresponds to the programming content, and identifies, for each of multiple portions of the programming content, an indicator of a likelihood that the portion is an exciting portion of the content. In one implementation, the meta data includes probabilities that segments of a baseball program are exciting, and is generated by analyzing the audio data of the baseball program for both excited speech and baseball hits. The meta data can then be used to generate a summary for the baseball program.
Abstract:
A method of digitally adding the appearance of makeup to a videoconferencing participant. The system and method for applying digital make-up operates in a loop processing sequential video frames. For each input frame, there are typically three general steps: 1) Locating the face and eye and mouth regions; 2) Applying digital make-up to the face, preferably with the exception of the eye and open mouth areas; and 3) Blending the make-up region with the rest of the face. In one embodiment of the invention, the background in the frame containing a video conferencing participant can also be modified so that other video conferencing participants cannot clearly see the background behind the participant in the image frame. In one such embodiment of the invention, the video conferencing participant tries to make his or her own image look comical or altered. In another embodiment of the invention, a particular remote participant tries to make another participant look funny to the other participants.
Abstract:
Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.
Abstract:
A program distribution system includes a plurality of set-top boxes that receive broadcast programming and segmentation data from content and information providers. The segmentation information indicates portions of programs that are to be included in skimmed or condensed versions of the received programming, and is produced using manual or automated methods. Automated methods include the use of ancillary production data to detect the most important parts of a program. A user interface allows a user to control time scale modification and skimming during playback, and also allows the user to easily browse to different points within the current program.
Abstract:
Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.