Abstract:
Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.
Abstract:
A system for communicating audio data signals comprises a source computer that performs an action, generates an event message corresponding to the action, converts the event message into an audio data signal, and communicates the audio data signal through its speaker. A source telephone receives a voice signal from a participant and the audio data signal through its microphone and communicates the audio data signal and voice as coherent sound via an audio communications medium. A recipient telephone receives the audio data signal from the coherent sound communicated via the audio communications medium and communicates the audio data signal via its speaker. A recipient computer receives the audio data signal through its microphone, extracts the event message from the audio data signal, and performs an action based on the event message from the audio data signal. The audio communications medium can comprise a telephone communications system or air.
Abstract:
Audio/video programming content is made available to a receiver from a content provider, and meta data is made available to the receiver from a meta data provider. The meta data corresponds to the programming content, and identifies, for each of multiple portions of the programming content, an indicator of a likelihood that the portion is an exciting portion of the content. In one implementation, the meta data includes probabilities that segments of a baseball program are exciting, and is generated by analyzing the audio data of the baseball program for both excited speech and baseball hits. The meta data can then be used to generate a summary for the baseball program.
Abstract:
A program distribution system includes a plurality of set-top boxes that receive broadcast programming and segmentation data from content and information providers. The segmentation information indicates portions of programs that are to be included in skimmed or condensed versions of the received programming, and is produced using manual or automated methods. Automated methods include the use of ancillary production data to detect the most important parts of a program. A user interface allows a user to control time scale modification and skimming during playback, and also allows the user to easily browse to different points within the current program.
Abstract:
A system and process for tracking an object state over time using particle filter sensor fusion and a plurality of logical sensor modules is presented. This new fusion framework combines both the bottom-up and top-down approaches to sensor fusion to probabilistically fuse multiple sensing modalities. At the lower level, individual vision and audio trackers can be designed to generate effective proposals for the fuser. At the higher level, the fuser performs reliable tracking by verifying hypotheses over multiple likelihood models from multiple cues. Different from the traditional fusion algorithms, the present framework is a closed-loop system where the fuser and trackers coordinate their tracking information. Furthermore, to handle non-stationary situations, the present framework evaluates the performance of the individual trackers and dynamically updates their object states. A real-time speaker tracking system based on the proposed framework is feasible by fusing object contour, color and sound source location.
Abstract:
Indications of which participant is providing information during a multi-party conference. Each participant has equipment to display information being transferred during the conference. A sourcing signaler residing in the participant equipment provides a signal that indicates the identity of its participant when this participant is providing information to the conference. The source indicators of the other participant equipment receive the signal and cause a UI to indicate that the participant identified by the received signal is providing information (e.g. the UI can causes the identifier to change appearance). An audio discriminator is used to distinguish between an acoustic signal that was generated by a person speaking from that generated in a band-limited manner. The audio discriminator analyzes the spectrum of detected audio signals and generates several parameters from the spectrum and from past determinations to determine the source of an audio signal on a frame-by-frame basis.
Abstract:
A system and method for automatically determining if a remote client is a human or a computer. A set of HIP design guidelines which are important to ensure the security and usability of a HIP system are described. Furthermore, one embodiment of this new HIP system and method is based on human face and facial feature detection. Because human face is the most familiar object to all human users the embodiment of the invention employing a face is possibly the most universal HIP system so far.