Abstract:
Systems and methods for encoding/decoding a video stream. Animated talking heads are coded using partial offline encoding, multiple video streams, and multiple reference frames. The content of a face animation video that is known beforehand is encoded offline and the remaining content is encoded online and included in the video stream. To reduce bit rate, a server can stream multiple video sequences to the client and the video sequences are stored in the client's frame store. The server sends instructions to play a particular video sequence instead of streaming the particular video sequence. Multiple video streams can also be streamed to the client. Positional data and blending data are also sent to properly position one video stream relative to another video stream and to blend one video stream into another video stream.
Abstract:
A method and system for detecting and counting mitotic figures in an image of a biopsy sample stained with at least one dye, includes color filtering the image in a computer process to identify pixels in the image that have a color which is indicative a mitotic figure; extracting the mitotic pixels in the image that are connected to one another in a computer process, thereby producing blobs of mitotic pixels; shape-filtering and clustering the blobs of mitotic pixels in a computer process to produce mitotic figure candidates; extracting sub-images of mitotic figures by cropping the biopsy sample image at the location of the blobs; extracting two sets of features from the mitotic figure candidates in two separate computer processes; determining which of the mitotic figure candidates are mitotic figures in a computer classification process based on the extracted sets of features; and counting the number of mitotic figures per square unit of biopsy sample tissue.
Abstract:
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
Abstract:
A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.
Abstract:
The invention provides a system and method that transforms a set of still/motion media (i.e., a series of related or unrelated still frames, web-pages rendered as images, or video clips) or other multimedia, into a video stream that is suitable for delivery over a display medium, such as TV, cable TV, computer displays, wireless display devices, etc. The video data stream may be presented and displayed in real time or stored and later presented through a set-top box, for example. Because these media are transformed into coded video streams (e.g. MPEG-2, MPEG-4, etc.), a user can watch them on a display screen without the need to connect to the Internet through a service provider. The user may request and interact with the desired media through a simple telephone interface, for example. Moreover, several wireless and cable-based services can be developed on the top of this system. In one possible embodiment, the system for generating a coded video sequence may include an input unit that receives the multimedia input and extracts image data, and derives the virtual camera scripts and coding hints from the image data, a video sequence generator that generates a video sequence based on the extracted image data and the derived virtual camera scripts and coding hints, and a video encoder that encodes the generated video sequence using the coding hints and outputs the coded video sequence to an output device. The system may also provide customized video sequence generation services to subscribers.
Abstract:
A system and method of providing sender-customization of multi-media messages through the use of emoticons is disclosed. The sender inserts the emoticons into a text message. As an animated face audibly delivers the text, emoticons associated with the message are started a predetermined period of time or number of words prior to the position of the emoticon in the message text and completed a predetermined length of time or number of words following the location of the emoticon. The sender may insert emoticons through the use of emoticon buttons that are icons available for choosing. Upon sender selections of an emoticon, an icon representing the emoticon is inserted into the text at the position of the cursor. Once an emoticon is chosen, the sender may also choose the amplitude for the emoticon and increased or decreased amplitude will be displayed in the icon inserted into the message text.
Abstract:
A system and method of controlling the movement of a virtual agent while the agent is listening to a human user during a conversation is disclosed. The method comprises receiving speech data from the user, performing a prosodic analysis of the speech data and controlling the virtual agent movement according to the prosodic analysis.
Abstract:
A system and method of controlling the movement of a virtual agent while the agent is listening to a human user during a conversation is disclosed. The method comprises receiving speech data from the user, performing a prosodic analysis of the speech data and controlling the virtual agent movement according to the prosodic analysis.
Abstract:
A system and method of controlling the movement of a virtual agent while the agent is listening to a human user during a conversation is disclosed. The method comprises receiving speech data from the user, performing a prosodic analysis of the speech data and controlling the virtual agent movement according to the prosodic analysis.
Abstract:
A multi-modal method for locating objects in images wherein a tracking analysis is first performed using a plurality of channels which may comprise a shape channel, a color channel, and a motion channel. After a predetermined number of frames, intermediate feature representations are obtained from each channel and evaluated for reliability. Based on the evaluation of each channel, one or more channels are selected for additional tracking. The results of all representations are ultimately integrated into a final tracked output. Additionally, any of the channels may be calibrated using initial results obtained from one or more channels.