Abstract:
A method and system for collecting and communicating contextual information relating to a VoIP conversation is provided. Structured hierarchies are utilized for efficient communications of various amounts and types of contextual information relating to a VoIP conversation. Information identifying at least one structured hierarchy, which will be used to carry the contextual information, is transmitted during establishment of a conversation between two VoIP enhanced devices. The structured hierarchy is selected from a set of predefined and declared structured hierarchies. Subsequently transmitted contextual information exchanged between two VoIP enhanced devices is represented in accordance with the identified structural hierarchy. VoIP clients, network infrastructure, and various service providers can collect the contextual information based on the identified structured hierarchy, update the contextual information by adding, deleting and/or modifying the contextual data. The updated contextual information will be transmitted to other clients, network infrastructure, and service providers.
Abstract:
A transmission method for video image data using an embedded bit stream in a hierarchical table-lookup vector quantizer comprises the steps encoding an image using hierarchical vector quantization and an embedding process to obtain an embedded bit stream for lossless transmission. The bit stream is selectively truncated and decoded to obtain a reconstructed image.
Abstract:
Correction of errors and losses in a receiver-driven layered multicast (RLM) of real-time media over a network is augmented using one or more layers of error correction information. Each receiver separately optimizes the quality of received information by subscribing to at least one error correction layer. Ideally, each source layer in a RLM has one or more associated multicasted error correction data streams. Each error correction layer contains information for replacing lost packets from the associated source layer. More than one error correction layer is proposed to correct for lost packets in other error correction layers. Error correction streams are preferably generated using a pseudo-Automatic Repeat Request (ARQ) wherein a broadcaster sends both the source packets in a primary stream and delayed versions thereof in one or more redundant streams. A hybrid technique combines the psuedo-ARQ method with an adaptation of Forward Error Correction (FEC) techniques.
Abstract:
A method and system for automatically modifying an original transcription produced as the output of a recognition operation produces a second, modified transcription, such as, for example, automatically correcting an errorful transcription produced by an OCR operation. The invention uses information in an input text image of character images and in an original transcription associated with the input text image to modify aspects of a formal image source model that models as a grammar the spatial image structure of a set of text images. A recognition operation is then performed on the input text image using the modified formal image source model to produce a second, modified transcription. When the original transcription is errorful, the second transcription is a corrected transcription. Several aspects of the formal image source model may be modified; in particular, character templates to be used in the recognition operation are trained in the font of the glyphs occurring in the input text image. When errors in the original transcription are caused by matching glyphs against templates that are inadequately specified for the given input text image, the subsequently performed recognition operation on the text image using the trained, font-specific character templates produces a more accurate transcription.
Abstract:
A method for segmenting audio data, comprising speech from a plurality of individual speakers, according to speaker is provided. The method comprises providing individual HMMs for each individual speaker, each individual HMM including at least one state, and constructing a speaker network HMM by connecting the individual HMMs in parallel. The audio data is then divided into segments by determining a most likely sequence of states through the speaker network HMM, each of the segments being associated with one of the individual HMMs. Afterward, the speaker of each of the segments is identified. The segmented data may be used to form an index into the audio data according to speaker.
Abstract:
A mediation server for controlling contents of incoming and outgoing communication information exchanged as part of a conversation is provided. The mediation server may be a centralized server between an internal (private) network and an external network, utilized for enforcing the internal network's policy and detecting a potential security compromise in the internal network. Predefined evaluation criteria are utilized to enforce internal policy or security policy within the internal network. When communication information is exchanged, the mediation server may monitor potential policy or security breaches in the communication information utilizing the predefined evaluation criteria and execute an appropriate action to prevent potential policy or security breaches.
Abstract:
Aspects of the present invention are directed at obtaining contextual information with a voicemail message. In accordance with one embodiment, a method is provided that obtains additional contextual information that is not obtained automatically when a voicemail message is received. More specifically, the method includes automatically obtaining a first set of contextual information from a client associated with the caller when the caller is transferred to a voicemail system. Then a determination is made regarding whether the callee requests that additional contextual information be obtained. If the callee requests that additional contextual information be obtained, the requested information is obtained from a third-party service or by prompting the caller.
Abstract:
Gaze tracking or other interest indications are used during a video conference to determine one or more audio sources that are of interest to one or more participants to the video conference, such as by determining a conversation from among multiple conversations that a subset of participants are participating in or listening to, for enhancing the audio experience of one or more of the participants.
Abstract:
Aspects of the present invention are directed at obtaining contextual information with a voicemail message. In accordance with one embodiment, a method is provided that obtains additional contextual information that is not obtained automatically when a voicemail message is received. More specifically, the method includes automatically obtaining a first set of contextual information from a client associated with the caller when the caller is transferred to a voicemail system. Then a determination is made regarding whether the callee requests that additional contextual information be obtained. If the callee requests that additional contextual information be obtained, the requested information is obtained from a third-party service or by prompting the caller.
Abstract:
The subject disclosure is directed towards an immersive conference, in which participants in separate locations are brought together into a common virtual environment (scene), such that they appear to each other to be in a common space, with geometry, appearance, and real-time natural interaction (e.g., gestures) preserved. In one aspect, depth data and video data are processed to place remote participants in the common scene from the first person point of view of a local participant. Sound data may be spatially controlled, and parallax computed to provide a realistic experience. The scene may be augmented with various data, videos and other effects/animations.