摘要:
According to an embodiment of the present invention, an apparatus for performing video conferencing is provided that includes an I-frame injector element operable to intercept I-frame requests from one or more end points and to attempt to service the I-frame requests such that at least a portion of the requests are prevented from propagating back to an originating sender. In more specific embodiments, when a receiver endpoint sends a fast video update (FVU) request upstream, it is intercepted by the I-frame injector element and rather than passing the FVU request to the sender the I-frame injector element replaces a next P-frame from the sender with an I-frame, whereby the I-frame is constructed so that when decoded, it matches the P-frame that it replaced. In still more detailed embodiments, the I-frame injector element operates in one of three modes that are associated with bandwidth parameters.
摘要:
A method for executing a video conference is provided that includes receiving one or more audio streams associated with a video conference from one or more end points and determining an active speaker associated with one of the end points. Audio information associated with the active speaker may be received at one or more media switches. One or more video streams may be suppressed except for a selected video stream associated with the active speaker, the selected video stream propagating to one or more of the media switches during the video conference. The selected video stream may be replicated such that it may be communicated to one or more of the end points associated with a selected one of the media switches.
摘要:
A method for executing a video conference is provided that includes receiving one or more audio streams associated with a video conference from one or more end points and determining an active speaker associated with one of the end points. Audio information associated with the active speaker may be received at one or more media switches. One or more video streams may be suppressed except for a selected video stream associated with the active speaker, the selected video stream propagating to one or more of the media switches during the video conference. The selected video stream may be replicated such that it may be communicated to one or more of the end points associated with a selected one of the media switches.
摘要:
An audio mixer on a first device receives one or more incoming audio streams. Each of the one or more incoming audio streams has an associated timestamp. The audio mixer generates a mixed audio stream from the one or more incoming audio streams. The audio mixer determines differences in the time base of each of the one or more incoming audio streams and the time base for the mixed audio stream. The audio mixer generates mapping parameters associated with the determined differences and transforms the timestamp of each of the one or more incoming audio streams to a corresponding output timestamp associated with the mixed audio stream according to the mapping parameters. the mapping parameters are provided to a video mixer for similar processing and transformation such that the mixed audio stream is in synchronization with a mixed video stream.
摘要:
A method for executing a video conference is provided that includes receiving one or more audio streams associated with a video conference from one or more end points and determining an active speaker associated with one of the end points. Audio information associated with the active speaker may be received at one or more media switches. One or more video streams may be suppressed except for a selected video stream associated with the active speaker, the selected video stream propagating to one or more of the media switches during the video conference. The selected video stream may be replicated such that it may be communicated to one or more of the end points associated with a selected one of the media switches.
摘要:
In one embodiment, a method can include: (i) establishing an internet protocol (IP) connection; (ii) forming a buffered version of a plurality of voice frame slices from received audio packets; and (iii) when an erasure is detected, performing a packet loss concealment (PLC) to provide a synthesized speech signal for the erasure, where the PLC can include: (a) identifying first and second pitches from the buffered version of the plurality of voice frame slices; and (b) forming the synthesized speech signal by using the first and second pitches, and more if needed, followed by an overlay-add (OLA).
摘要:
Disclosed are video conferencing systems, devices, architectures, and methods for using media notifications to coordinate switching between video in a distributed arrangement. An exemplary media switch in accordance with embodiments can include: a first interface configured for a first type communication with an endpoint; a second interface configured for the first type communication with another media switch, the second interface being configured to receive a first video stream having a first characteristic and a second video stream having a second characteristic; a third interface configured for a second type communication with a stream controller, the stream controller being configured to provide a notification; and a fourth interface configured for the second type communication with a controlling server, whereby the media switch is configured to re-target an active stream in response to the notification or a difference between the first and second characteristics.
摘要:
A method for implementing a noise suppressor in a speech recognition system comprises a filter bank for separating source speech data into discrete frequency sub-bands to generate filtered channel energy, and a noise suppressor for weighting the frequency sub-bands to improve the signal-to-noise ratio of the resultant noise-suppressed channel energy. The noise suppressor preferably includes a noise calculator for calculating background noise values, a speech energy calculator for calculating speech energy values for each channel of the filter bank, and a weighting module for applying calculated weighting values to the projected channel energy to generate the noise-suppressed channel energy.
摘要:
A method for effectively suppressing background noise in a speech detection system comprises a filter bank for separating source speech data into discrete frequency sub-bands to generate filtered channel energy, and a noise suppressor for weighting the frequency sub-bands to improve the signal-to-noise ratio of the resultant noise-suppressed channel energy. The noise suppressor preferably includes a subspace module for using a Karhunen-Loeve transformation to create a subspace based on the background noise, a projection module for generating projected channel energy by projecting the filtered channel energy onto the created subspace, and a weighting module for applying calculated weighting values to the projected channel energy to generate the noise-suppressed channel energy.
摘要:
A method for implementing a speech recognition system for use during conditions with background noise includes the steps of calculating, in real-time, sequential short-term delta energy parameters for speech energy from a spoken utterance, determining threshold values in the speech energy, and identifying a beginning point and an ending point for the spoken utterance based on the relationship between the threshold values and the short-term delta energy parameters.