摘要:
An exemplary embodiment is a method of processing audio data comprising: characterising an audio data representative of a recorded sound scene into a set of sound sources occupying positions within a time and space reference frame; analysing the sound sources; and generating a modified audio data representing sound captured from at least one virtual microphone configured for moving about the recorded sound scene, wherein the virtual microphone is controlled in accordance with a result of the analysis of said audio data, to conduct a virtual tour of the recorded sound scene.
摘要:
A method of generating an audio signal comprises receiving a plurality of input audio signals from a plurality of microphones forming a microphone array, the plurality of input audio signals being representative of a set of sound sources within the auditory field of view of the microphone array at a given instant in time; receiving a motion input signal from a motion sensor, the motion input signal being representative of the motion of the microphone array; and manipulating the received plurality of input audio signals in response to the received motion input signal to generate an audio output signal that is representative of a set of sound sources within the auditory field of view of a virtual microphone, the apparent motion of the virtual microphone being independent of the motion of the microphone array.
摘要:
An image viewed by a person is recorded in response to the pointing direction of the eyes of the person by using an optical sensor arrangement that simultaneously derives image segments corresponding with images seen by the person looking forward of his head and to both sides of his head. Alternatively, the sensor arrangement includes plural optical sensors for these images. One image of the plural sensors is selected for recording based on rotation of the head.
摘要:
Systems and methods of producing video data and/or audio-photos from a static digital image are disclosed. One such method, among others, comprises receiving input from a user indicating sequentially, in real time, a plurality of regions of the static digital image. The method also includes processing the user input to determine the visual content of each of a sequence of video frames and generating output data representative of the sequence of video frames. The sequence and composition of the video frames are determined such that the visual content of the video frames is taken from the static digital image. For each region of the static image indicated by the user, a video frame is composed such that the said region occupies a substantial part of the video frame. The sequence of video frames shows the regions indicated by the user in sequential correspondence with the sequence in which the user indicated the regions and substantially in pace with the time in which the user indicated the regions.
摘要:
An attention detection system detects a condition of shared attention of plural persons. The system includes plural body language detectors each associated with a different person for detecting body language of the associated person. An analyzer receives body language information from the body language detectors, analyzes the body language of the persons and determines when said body language information indicates shared attention between the persons. The attention detection system generates a signal that captures an image of the shared attention.
摘要:
Embodiments provide a system and method for generating images of a document with interaction of a primary user with the document in an interaction session. Briefly described, one embodiment comprises an image capture means adapted to capture an initial image of the document without interaction by a user and to subsequently capture at least one additional image of the document during an interaction session including interaction from the user during that session, and a processing means adapted to generate a data set representing the interaction session from the initial image and the additional image, the data set containing at least the initial image along with information indicative of the interaction of the user during the session obtained from the additional image.
摘要:
A digital camera derives an infrared and visible signals from IR and visible sensors. The sensor fields of view overlap or share a common field of view. An analyzer of the IR signal provides the location of and information from an active or passive IR target. The analyzer responds to the location and information signals for visible image composition control (pan/tilt/zoom/timing) and selection of visible image pictures for storage to provide ancillary information, such as personal details of a target wearer. The visible and infrared image signals are combined to correct for sensitivity of the visible and/or IR sensors to IR and visible wavelengths, respectively.
摘要:
Apparatus for providing composite electronic image of a scene including at least one link to other information comprises a camera for providing an image signal, a link generator responsive to the scene or the immediate environment for generating a link signal, and a combiner which combines the image and link signals to provide the electronic image. As shown, the viewed scene includes a zoo information display with a barcode which is read by the camera to provide the final part of the link information for concatenation onto a first part which is loaded into the camera at entry to the zoo (or preloaded if the camera belongs to the zoo). Clicking on the display on the viewed image provides a link to a zoo web page associated with the enclosure.
摘要:
One embodiment is a method for reviewing videos, comprising: deriving at least two video segments from unedited video footage based upon a previously determined unique saliency, each saliency associated with a corresponding one of the video segments; and displaying a display window for each of the derived video segments substantially concurrently.
摘要:
In automatic photographic or electronic camera apparatus, sounds from a person are preferentially received or identified to provide an audio signal, used to produce a saliency signal (indicative of the person's response to circumstances) to control the camera or an image signal produced therefrom. The sounds are detected by a sound conduction microphone, e.g. on the head or throat of the person, or by voice identification circuitry used with a less specific microphone not necessarily mounted on the person, who may or may not be wearing the camera. Control can be in real time, or the sound or saliency signal can be recorded with an image signal for subsequent control of the latter. The audio signal is categorised by waveform analysis, e.g. for the recognition of sounds, such as speech or non-speech vocal sounds, and/or non-vocal sounds; signal amplitude may also be taken into account.