Abstract:
Systems and methods are disclosed for packet voice conferencing. An encoding system accepts two sound field signals, representing the same sound field sampled at two spatially-separated points. The relative delay between the two sound field signals is detected over a given time interval. The sound field signals are combined and then encoded as a single audio signal, e.g., by a method suitable for monophonic VoIP. The encoded audio payload and the relative delay are placed in one or more packets and sent to a decoding device via the packet network.The decoding device uses the relative delay to drive a playout splitter—once the encoded audio payload has been decoded, the playout splitter creates multiple presentation channels by inserting the transmitted relative delay in the decoded signal for one (or more) of the presentation channels. The listener thus perceives a speaker's voice as originating from a location related to the speaker's physical position at the other end of the conference. An advantage of these embodiments is that a pseudo-stereo conference can be conducted with virtually the same bandwidth as a monophonic conference.
Abstract:
A system and method are disclosed for packet voice conferencing. The system and method divide a conferencing presentation sound field into sectors, and allocate one or more sectors to each conferencing endpoint. At some point between capture and playout, the voice data from each endpoint is mapped into its designated sector or sectors. Thereafter, when the voice data from a plurality of participants from multiple endpoints is combined, a listener can identify a unique apparent location within the presentation sound field for each participant. The system allows a conference participant to increase their comprehension when multiple participants speak simultaneously, as well as alleviate confusion as to who is speaking at any given time.
Abstract:
A PC-based server platform includes a first backplane bus used for transferring data and commands to various PC peripheral devices. A network router and a telephony endpoint card are coupled to the backplane bus and separately coupled through a second Time Division Multiplexed (TDM) bus. The router includes interfaces to various packet switched networks such as a Wide Area Network (WAN) and a Local Area Network (LAN). The TDM bus is used to route telephony data between the different Internet Protocol (IP)-based networks and the telephony card independently of the host system. The PC host processor also uses the router as a standard LAN interface for transferring data packets. A DSP voice processing card is coupled between the backplane bus and the TDM bus to compress and decompress the telephony data transferred on the TDM bus.
Abstract:
A method and system for logging voice quality issues for a communication connection includes receiving a signal for logging quality information for a voice connection at an endpoint of the voice connection. Voice samples are collected from the voice connection at the endpoint. The voice samples are stored in an error log at the endpoint.
Abstract:
Devices, softwares and methods generate, in real time, indexing metadata for select portions of a telephone conversation or conference. The indexing metadata is generated responsive to inputs received while the conversation is being recorded live. The inputs are either by a user pressing a soft key on a telephone device, or by a voice conference bridge determining who is the dominant speaker in a multi-party conference.
Abstract:
A system and method for improving the intelligibility of a moderator during a multi-party communication session includes receiving a plurality of participant voice streams from a plurality of respective conference participants. An incoming moderator voice stream may be received from a moderator. The plurality of participant voice streams and the moderator voice stream are transmitted such that the intelligibility of the moderator voice stream is improved relative to at least one of the participant voice streams.
Abstract:
Systems and methods are disclosed for packet voice conferencing. An encoding system accepts two sound field signals, representing the same sound field sampled at two spatially-separated points. The relative delay between the two sound field signals is detected over a given time interval. The sound field signals are combined and then encoded as a single audio signal, e.g., by a method suitable for monophonic VoIP. The encoded audio payload and the relative delay are placed in one or more packets and sent to a decoding device via the packet network. The decoding device uses the relative delay to drive a playout splitter—once the encoded audio payload has been decoded, the playout splitter creates multiple presentation channels by inserting the transmitted relative delay in the decoded signal for one (or more) of the presentation channels. The listener thus perceives a speaker's voice as originating from a location related to the speaker's physical position at the other end of the conference. An advantage of these embodiments is that a pseudo-stereo conference can be conducted with virtually the same bandwidth as a monophonic conference.
Abstract:
According to one embodiment of the invention, a method for managing time-sensitive packetized data streams at a receiver includes receiving a time-sensitive packet of a data stream, analyzing an energy level of a payload signal of the packet, and determining whether to drop the packet based on the energy level of the payload signal.
Abstract:
A method and system for participant control of privacy during a multiparty communication session includes receiving a request from a first participant to a multiparty communication connection for a sidebar between the first participant and a second participant to the multiparty communication connection. The sidebar is provided by at least substantially eliminating voice streams generated by the first participant and the second participant from conference output streams generated for a set of remaining participants to the multiparty communication connection.
Abstract:
A test system measures performance of telephone network echo cancellers using a primary criterion of estimated user annoyance due to audible returned echo. The invention generates live telephone calls, uses real speech samples as stimulus signals and provides tail-circuit emulation using actual measured telephone tail-circuit impulse responses. These features provide better ‘real-life’ test conditions for the echo canceller system under test than current ITU standard test methods. Two methods are employed for echo canceller performance evaluation via metrics of estimated user annoyance due to echo. Energy-based method employs point-by-point comparison of talker speech and talker echo signal energy envelopes and uses variable energy thresholds for estimation of echo audibility. A perceptual-model based method uses a Perceptual Speech Distortion Metric (PSDM), such as ITU P.861, in an unique configuration to estimate user annoyance due to audible echo. Echo canceller performance is tested under both single-talk and double-talk conditions. Innovative application of the PSDM method in double-talk tests allow estimation of quality of received double-talk speech.