摘要:
Techniques for managing visual compositions for a multimedia conference call are described. An apparatus may comprise a processor to allocate a display object bit rate for multiple display objects where a total display object bit rate for all display objects is equal to or less than a total input bit rate, and decode video information from multiple video streams each having different video layers with different levels of spatial resolution, temporal resolution and quality for two or more display objects. Other embodiments are described and claimed.
摘要:
A local network coding framework and method including techniques to improve efficiency in a wireless network by reducing overhead. The local network coding method includes exchanging data availability between nodes on the wireless network by sending Bloom filters of lists of packets to neighboring nodes. Based on data availability, optimized mixing of pure packets is performed to form mixture packets for output. A separate acknowledgement buffer keeps track of the pure packets transmitted but not acknowledged. If an acknowledgement does not arrive after a certain time period, the packet is assumed to be lost and is retransmitted. An optimized packet mixing process generates mixture packets and decides which nodes to send the mixture packets. The local network coding framework and method also includes methods for representing the composition of a mixture packet and using mixing at a wireless access point to improve the performance of the wireless local area network.
摘要:
A system and method for correcting errors and losses occurring during a receiver-driven layered multicast (RLM) of real-time media over a heterogeneous packet network such as the Internet. This is accomplished by augmenting RLM with one or more layers of error correction information. This allows each receiver to separately optimize the quality of received audio and video information by subscribing to at least one error correction layer. Ideally, each source layer in a RLM would have one or more multicasted error correction data streams (i.e., layers) associated therewith. Each of the error correction layers would contain information that can be used to replace lost packets from the associated source layer. More than one error correction layer is proposed as some of the error correction packets contained in the data stream needed to replace the packets lost in the associated source stream may themselves be lost in transmission. A preferred process for generating the error correction streams involves the use of a unique adaptation of the Forward Error Correction (FEC) techniques. This process encodes the transmission data using a linear transform which adds redundant elements. The redundancy permits losses to be corrected because any of the original data elements can be derived from any of the encoded elements. Thus, as long as enough of the encoded data elements are received so as to equal the number of the original data elements, it is possible to derive all the original elements.
摘要:
A system and process according to the present invention involves tagging prescribed portions of the data of each layer in a layered multicast or layered presentation with an indicator of the importance or utility that the data provides to the receiver. Additionally, the data is tagged with a cost factor involved with sending the data. The aforementioned portions of the data can be an entire data stream of a layer, or some part thereof all the way down to the individual packets making up the stream. The invention also involves determining the optimized scenario for sending the data from the sender to the receiver based on the data tags.
摘要:
To obtain real-time responses with interactive multimedia servers, the server provides at least two different audio/visual data streams. A first data stream has fewer bits per frame and provides a video image much more quickly than a second data stream with a higher number of bits and hence higher quality video image. The first data stream becomes available to a client much faster and may be more quickly displayed on demand while the second data stream is sent to improve the quality as soon as the playback buffer can handle it. In one embodiment, an entire video signal is layered, with a base layer providing the first signal and further enhancement layers comprising the second. The base layer may be actual image frames or just the audio portion of a video stream. The first and second streams are gradually combined in a manner such that the playback buffer does not overflow or underflow.
摘要:
A projection onto convex sets (POCS)-based method for consistent reconstruction of a signal from a subset of quantized coefficients received from an N×K overcomplete transform. By choosing a frame operator F to be the concatenization of two or more K×K invertible transforms, the POCS projections are calculated in RK space using only the K×K transforms and their inverses, rather than the larger RN space using pseudo inverse transforms. Practical reconstructions are enabled based on, for example, wavelet, subband, or lapped transforms of an entire image. In one embodiment, unequal error protection for multiple description source coding is provided. In particular, given a bit-plane representation of the coefficients in an overcomplete representation of the source, one embodiment of the present invention provides coding the most significant bits with the highest redundancy and the least significant bits with the lowest redundancy. In one embodiment, this is accomplished by varying the quantization stepsize for the different coefficients. Then, the available received quantized coefficients are decoded using a method based on alternating projections onto convex sets.
摘要:
A technique for automatically producing, or training, a set of bitmapped character templates defined according to the sidebearing model of character image positioning uses as input a text line image of unsegmented characters, called glyphs, as the source of training samples. The training process also uses a transcription associated with the text line image, and an explicit, grammar-based text line image source model that describes the structural and functional features of a set of possible text line images that may be used as the source of training samples. The transcription may be a literal transcription of the line image, or it may be nonliteral, for example containing logical structure tags for document formatting and layout, such as found in markup languages. Spatial positioning information modeled by the text line image source model and the labels in the transcription are used to determine labeled image positions identifying the location of glyph samples occurring in the input line image, and the character templates are produced using the labeled image positions. In another aspect of the technique, a set of character templates defined by any character template model, such as a segmentation-based model, is produced using the grammar-based text line image source model and specifically using a tag transcription containing logical structure tags for document formatting and layout. Both aspects of the training technique may represent the text line image source model and the transcription as finite state networks.
摘要:
A method of automatically identifying bitmapped image objects. Each of a set of templates in an object template library is compared with all areas of like size of a bitmapped image. A set of signals is generated for each such comparison that satisfies a defined matching criteria between the template and the image area being compared. The set of signals identifies the object based on the matching template, the location of the object in the image and an indication of the goodness of the match between the object and the template. A series of possible parse trees are formed that describe the image with a probability of occurrence for each tree. Each parent node and its child nodes of each parse tree satisfies a grammatical production rule in which some of the production rules define spatial relationships between objects in the image. The one of the possible parse trees which has the largest probability of occurence is selected for further utilization.