摘要:
Techniques are provided for sending and receiving key frames and key frame request messages. At a video conference bridge, a key frame request message is received from a first endpoint device. The key frame request message comprises a request for a key frame from a second endpoint device. When a prior key frame request message is received before the key frame request message, a key frame request time value is determined that corresponds to an amount of time between receiving the key frame request message and receiving the prior key frame request message. This value is compared to a threshold time value. When the key frame request time is greater than the threshold time, a key frame request forwarding message is generated, and the key frame request forwarding message is sent to the second endpoint device to request the key frame from the second endpoint device.
摘要:
A method, an apparatus and a medium encoded with instructions for providing layout selection, participant selection, and/or participant-to-participant far end camera control of the selected participant for use in a continuous presence multipoint videoconference. The method includes receiving one or more far end camera control messages over a packet network from a first participant of a multipoint videoconference; maintaining an indication of a far end camera control mode for the first participant, the mode being one of a set of modes; and depending on the far end camera control mode and on controls possible, carrying out a control according to one ore more the camera control messages.
摘要:
Marking a keyframe of a media stream in a communication system involves one or more entry media switches in communication with one or more endpoints. An entry media switch receives a media stream from an endpoint, where the media stream comprises a sequence of packets with a keyframe. The keyframe is detected and marked with a keyframe indicator. The keyframe indicator is operable to notify a downstream device of the keyframe. An output media stream is outputted.
摘要:
A method for executing a video conference is provided that includes receiving one or more audio streams associated with a video conference from one or more end points and determining an active speaker associated with one of the end points. Audio information associated with the active speaker may be received at one or more media switches. One or more video streams may be suppressed except for a selected video stream associated with the active speaker, the selected video stream propagating to one or more of the media switches during the video conference. The selected video stream may be replicated such that it may be communicated to one or more of the end points associated with a selected one of the media switches.
摘要:
A system and method for speech verification using out-of-vocabulary models includes a speech recognizer that has a model bank with system vocabulary word models, a garbage model, and one or more noise models. The model bank may reject an utterance or other sound as an invalid vocabulary word when the model bank identifies the utterance or other sound as corresponding to the garbage model or the noise models. Initial noise models may be selectively combined into a pre-determined number of final noise model clusters to effectively reduce the number of noise models that are utilized by the model bank of the speech recognizer to verify system vocabulary words.
摘要:
A method for implementing a speech verification system for use in a noisy environment comprises the steps of generating a confidence index for an utterance using a speech verifier, and controlling the speech verifier with a processor, wherein the utterance contains frames of sound energy. The speech verifier includes a noise suppressor, a pitch detector, and a confidence determiner. The noise suppressor suppresses noise in each frame in the utterance by summing a frequency spectrum for each frame with frequency spectra of a selected number of previous frames to produce a spectral sum. The pitch detector applies a spectral comb window to each spectral sum to produce correlation values for each frame in the utterance. The pitch detector also applies an alternate spectral comb window to each spectral sum to produce alternate correlation values for each frame in the utterance. The confidence determiner evaluates the correlation values to produce a frame confidence measure for each frame in the utterance. The confidence determiner then uses the frame confidence measures to generate the confidence index for the utterance, which indicates whether the utterance is or is not speech.
摘要:
A rate adaptive video conference bridge and related techniques are provided. At a video conference bridge, a source video stream is received from a source endpoint device in a network. The source video stream is encoded using a first encoder unit and second encoder unit to generate respective first and second encoded video streams. A determination is made whether to decrease or increase a bit rate of the source video stream based on network condition information. If the bit rate is to be decreased, the first encoder unit is instructed to send the first encoded video stream to a destination endpoint device. If the bit rate is to be increased, the second encoder unit is instructed to send the second encoded video stream to the destination endpoint device.
摘要:
In one embodiment, an apparatus includes a first module that causes the first endpoint to receive a current speaker's video stream if the first endpoint is not the current speaker and to receive a last speaker's video stream if the first endpoint is the current speaker. The apparatus includes a second module that causes the second endpoint to receive a continuous presence, current speaker video stream if the second endpoint is not the current speaker and to receive a continuous presence, last speaker video stream if the second endpoint is the current speaker. The continuous presence, current speaker video stream comprises two or more video streams, one of which includes at least a portion of the current speaker's video stream. The continuous presence, last speaker video stream comprises two or more video streams, one of which includes at least a portion of a last speaker's video stream.
摘要:
A method, an apparatus and a medium encoded with instructions for providing layout selection, participant selection, and/or participant-to-participant far end camera control of the selected participant for use in a continuous presence multipoint videoconference. The method includes receiving one or more far end camera control messages over a packet network from a first participant of a multipoint videoconference; maintaining an indication of a far end camera control mode for the first participant, the mode being one of a set of modes; and depending on the far end camera control mode and on controls possible, carrying out a control according to one ore more the camera control messages.
摘要:
In one embodiment, a method includes receiving application traffic at a network device from one or more endpoints, measuring performance of applications at the network device, optimizing TCP (Transmission Control Protocol) applications and UDP (User Datagram Protocol) applications based on the measured performance and policy input received at the network device, queuing the application traffic at the network device such that the application traffic shares available bandwidth in accordance with the measured performance and the policy input, and transmitting the application traffic over a wide area network. An apparatus is also disclosed.