Abstract:
A system that facilitates blind source separation in a distributed microphone meeting environment for improved teleconferencing. Input sensor (e.g., microphone) signals are transformed to the frequency-domain and independent component analysis is applied to compute estimates of frequency-domain processing matrices. Modified permutations of the processing matrices are obtained based upon a maximum magnitude based de-permutation scheme. Estimates of the plurality of source signals are provided based upon the modified frequency-domain processing matrices and input sensor signals.Optionally, segments during which the set of active sources is a subset of the set of all sources can be exploited to compute more accurate estimates of frequency-domain mixing matrices. Source activity detection can be applied to determine which speaker(s), if any, are active. Thereafter, a least squares post-processing of the frequency-domain independent components analysis outputs can be employed to adjust the estimates of the source signals based on source inactivity.
Abstract:
A system that facilitates blind source separation in a distributed microphone meeting environment for improved teleconferencing. Input sensor (e.g., microphone) signals are transformed to the frequency-domain and independent component analysis is applied to compute estimates of frequency-domain processing matrices. Modified permutations of the processing matrices are obtained based upon a maximum magnitude based de-permutation scheme. Estimates of the plurality of source signals are provided based upon the modified frequency-domain processing matrices and input sensor signals.Optionally, segments during which the set of active sources is a subset of the set of all sources can be exploited to compute more accurate estimates of frequency-domain mixing matrices. Source activity detection can be applied to determine which speaker(s), if any, are active. Thereafter, a least squares post-processing of the frequency-domain independent components analysis outputs can be employed to adjust the estimates of the source signals based on source inactivity.
Abstract:
A temporal information integration dis-occlusion system and method for using historical data to reconstruct a virtual view containing an occluded area. Embodiments of the system and method use temporal information of the scene captured previously to obtain a total history. This total history is warped onto information captured by a camera at a current time in order to help reconstruct the dis-occluded areas. The historical data (or frames) from the total history match only a portion of the frames contained in the captured information. This warping yields warped history information. Warping is performed by using one of two embodiments to match points in an estimation of the current information to points in the captured information. Next, regions of current information are split using a classifier. The warped history information and the captured information then are merged to obtain an estimate for the current information and the reconstructed virtual view.
Abstract:
A method and system for processing contextual information relating to an exchange of a conversation over a communication channel is provided. Several users, and/or service providers are allowed to specify a set of rules relating to a conversation channel. Contextual information, including information relating to the specified set of rules and conditions of the users, is exchanged among the users and/or several service providers when one user requests to initiate a communication channel. The received contextual information is processed to extract a set of rules and the current conditions of users. If the current conditions of the users satisfy the set of rules, a corresponding communication channel is established among the users. However, additional contextual information may be received and processed whenever there is a change in the contextual information during the conversation. Appropriate actions to the existing communication channel may be determined based on the changes.
Abstract:
A technique for automatically training a set of character templates using unsegmented training samples uses as input a two-dimensional (2D) image of characters, called glyphs, as the source of training samples, a transcription associated with the 2D image as a source of labels for the glyph samples, and an explicit, formal 2D image source model that models as a grammar the structural and functional features of a set of 2D images that may be used as the source of training data. The input transcription may be a literal transcription associated with the 2D input image, or it may be nonliteral, for example containing logical structure tags for document formatting, such as found in markup languages. The technique uses spatial positioning information about the 2D image modeled by the 2D image source model and uses labels in the transcription to determine labeled glyph positions in the 2D image that identify locations of glyph samples. The character templates are produced using the input 2D image and the labeled glyph positions without assigning pixels to glyph samples prior to training. In one implementation, the 2D image source model is a regular grammar having the form of a finite state transition network, and the transcription is also represented as a finite state network. The two networks are merged to produce a transcription-image network, which is used to decode the input 2D image to produce labeled glyph positions that identify training data samples in the 2D image. In one implementation of the template construction process, a pixel scoring technique is used to produce character templates contemporaneously from blocks of training data samples aligned at glyph positions.
Abstract:
A multiuser scheme allowing for a number of users, sets of user, or carriers to share one or more channels is provided. In the invention, the available channel bandwidth is subdivided into a number of equal-bandwidth subchannels according to standard OFDM practice. A transmitter transmits data on a set of OFDM subchannels that need not be contiguous in the spectrum or belong to the same OFDM channel. A receiver receives and decodes the data and detects errors on subchannels. The receiver then broadcasts the identity of those subchannels on which the error rate exceeds a specific threshold, and the transmitter may select different subchannels for transmission based on this information.
Abstract:
A multiuser scheme allowing for a number of users, sets of user, or carriers to share one or more channels is provided. In the invention, the available channel bandwidth is subdivided into a number of equal-bandwidth subchannels according to standard OFDM practice. A transmitter transmits data on a set of OFDM subchannels that need not be contiguous in the spectrum or belong to the same OFDM channel. A receiver receives and decodes the data and detects errors on subchannels. The receiver then broadcasts the identity of those subchannels on which the error rate exceeds a specific threshold, and the transmitter may select different subchannels for transmission based on this information.
Abstract:
Generally described, multimode authentication over a VoIP communication channel is provided. A calling client and a called client may be authenticated for a communication channel establishment. When a calling client requests a call connection with a called client, the calling client is authenticated for the communication channel, based on exchanged contextual information between the calling client and the called client. Likewise, the called client is authenticated for the communication channel by the calling client. Upon authentication, a communication channel is established, over which the calling client and the called client are allowed to exchange more contextual and voice/multimedia information. During a conversation, when a secured service is desired by any of the clients, a series of authentication processes can be performed to grant access to the secured service over the communication channel without loss of the communication channel connection.
Abstract:
A method and system for communicating a variable set of contextual information relating to a conversation over a communication channel is provided. When the contextual information is exchanged, any authorized sending party of the contextual information can change the scope, content, or amount of the contextual information that is transmitted to a next receiving party in a determined communication channel path. Before transmitting the contextual information, a desirable scope of the contextual information may be determined based on the next receiving party, in conjunction with the sending party's rules. The contextual information may be updated by adding new contextual information and/or deleting part of the contextual information which is outside of the scope. No contextual information may be transmitted if the next destination desires no contextual information or does not have capabilities to receive any contextual information.
Abstract:
A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined. The transcription is analyzed to identify words, and transcription word lengths are computed using an estimated image character width of glyphs in the text image. The sequence of observed image word lengths is then matched to the sequence of computed transcription word lengths using a dynamic programming algorithm that finds a best path through a two-dimensional lattice of nodes and transitions between nodes, where the transitions represent pairs of sequences of zero or more word lengths. An output data structure contains entries, each of which pairs a transcription word with a matching image word location.