Abstract:
Embodiments of the present invention provide a method, device, and system for classifying voice conference minutes. The method is: performing voice source locating according to audio data of the conference site so as to acquire a location of a voice source corresponding to the audio data, writing the location of the voice source into additional field information of the audio data, writing a voice activation flag into the additional field information, packaging the audio data as an audio code stream, and sending the audio code stream and the additional field information of the audio code stream to a recording server, so that the recording server classifies the audio data according to the additional field information and writes a participant identity that corresponds to the location of the voice source corresponding to the audio data into the additional field information of the audio code stream.
Abstract:
The present invention discloses an echo cancellation method. The method includes: dividing an audio signal into a high-band audio signal and a low-band audio signal; performing adaptive filtering on the low-band audio signal, and performing synthesis filtering on a signal obtained after the low-band audio signal undergoes the adaptive filtering and on the high-band audio signal to generate a preliminary echo cancellation signal; performing envelope predication echo suppression on a high-band signal in the preliminary echo cancellation signal, and calculating and outputting a residual echo suppression coefficient; performing echo suppression on a low-band signal in the preliminary echo cancellation signal, and outputting a processing result; and multiplying the output result by the residual echo suppression coefficient, and outputting a signal of which echoes are canceled.
Abstract:
The present invention discloses an image controlling method, device, and system for a composed-image video conference, where the method includes receiving audio data of sites; obtaining, according to audio data of each site of the sites and in real time, a voice feature value that is within a first specified period and of a corresponding site, where the voice feature value is used to represent an activation state of the site; selecting a specified site from the multiple sites according to an activation state of each site; and filling a picture of the specified site into a sub-image of a composed image, to update the composed image in real time. This remarkably improves an effect of a conference, and improves experience of participants. In addition, a quantity and locations of sub-images in the composed image may be further adjusted dynamically, which also effectively improves the effect of the conference.