摘要:
A first set of signals from an array of one or more microphones, and a second signal from a reference microphone are used to calibrate a set of filter parameters such that the filter parameters minimize a difference between the second signal and a beamformer output signal that is based on the first set of signals. Once calibrated, the filter parameters are used to form a beamformer output signal that is filtered using a non-linear adaptive filter that is adapted based on portions of a signal that do not contain speech, as determined by a speech detection sensor.
摘要:
Multi-device capture and spatial browsing of conferences is described. In one implementation, a system detects cameras and microphones, such as the webcams on participants' notebook computers, in a conference room, group meeting, or table game, and enlists an ad-hoc array of available devices to capture each participant and the spatial relationships between participants. A video stream composited from the array is browsable by a user to navigate a 3-dimensional representation of the meeting. Each participant may be represented by a video pane, a foreground object, or a 3-D geometric model of the participant's face or body displayed in spatial relation to the other participants in a 3-dimensional arrangement analogous to the spatial arrangement of the meeting. The system may automatically re-orient the 3-dimensional representation as needed to best show the currently interesting event such as current speaker or may extend navigation controls to a user for manually viewing selected participants or nuanced interactions between participants.
摘要:
Systems and methods for determining a virtual sound source position by determining an output for loudspeakers by the position of the loudspeakers in relation to a listener. The output of respective loudspeakers is generated using aural cues to give the listener knowledge of the virtual position of the virtual sound source. Both a gain in intensity and a delay are simulated.
摘要:
A panoramic camera is configured to automatically determine parameters of a table upon which the camera is situated as well as positional information of the camera relative to the table. In an initialization stage, table edges are detected to create an edge map. A Hough transformation-like symmetry voting operation is performed to clean up the edge map and to determine camera offset, camera orientation and camera tilt. The table is then fit to a table model to determine table parameters. In an operational stage, table edges are detected to create an edge map and the table model is fit to the edge map. The output can then be used for further panoramic image processing such as head size normalization, zooming, compensation for camera movement, etc.
摘要:
In a method for compressing a video clip containing audio content and image content, an image and/or an audio portion of individual video frames of the video clip are analyzed. Next frame scores are calculated for the video frames. Each frame score is based on at least one image attribute of the image of the video frame, and/or an audio attribute of the audio portion of the video frame. Next, key frames are identified that have a frame score that exceeds a threshold frame score. Finally, a compressed video clip is formed in which the images of non-key frames are removed. A system for implementing the method is also disclosed.
摘要:
A classifier is trained on a first set of examples, and the trained classifier is adapted to perform on a second set of examples. The classifier implements a parameterized labeling function. Initial training of the classifier optimizes the labeling function's parameters to minimize a cost function. The classifier and its parameters are provided to an environment in which it will operate, along with an approximation function that approximates the cost function using a compact representation of the first set of examples in place of the actual first set. A second set of examples is collected, and the parameters are modified to minimize a combined cost of labeling the first and second sets of examples. The part of the combined cost that represents the cost of the modified parameters applied to the first set is calculated using the approximation function.
摘要:
A visual control system controls a controlled component. In one embodiment, the visual control system controls the controlled component based on a visual location of a user. In another embodiment, input from a visual perception device is used to provide focus control for an audio input device. In additional embodiments, the visual control system stops, starts or suppresses speech recognition or other audio functions when the direction of the sound detected by the audio input device is not coming from the user's visual location.
摘要:
Video images representative of a conferee's head are received and evaluated with respect to a reference model to monitor a head position of the conferee. A personalized face model of the conferee is captured to track head position of the conferee. In a stereo implementation, first and second video images representative of a first conferee taken from different views are concurrently captured. A head position of the first conferee is tracked from the first and second video images. The tracking of head-position through a personalized model-based approach can be used in a number of applications such as human-computer interaction and eye-gaze correction for video conferencing.
摘要:
A digital camera having a single image sensor made up of an array of filtered photosites used to capture non-visible light wavelengths in addition to the standard red/green/blue (RGB) or other visible light intensity values is presented. Essentially, this is accomplished using a separate filter disposed over each photosite that exhibits a light transmission function with regard to wavelength which passes only a prescribed range of wavelengths—some passing light in the visible light spectrum and others in the non-visible light spectrum. The photosites passing non-visible light wavelengths can be configured to pass light in the infrared (IR) light spectrum, which can be limited to just the near infrared (NIR) spectrum if desired, or alternately light in the ultra-violet (UV) light spectrum.
摘要:
Image enhancement techniques are described to enhance an image in accordance with a set of training images. In an implementation, an image color tone map is generated for a facial region included in an image. The image color tone map may be normalized to a color tone map for a set of training images so that the image color tone map matches the map for the training images. The normalized color tone map may be applied to the image to enhance the in-question image. In further implementations, the procedure may be updated when the average color intensity in non-facial regions differs from an accumulated mean by a threshold amount.