Abstract:
Embodiments provide a method and system of text independent speaker recognition with a complexity comparable to a text dependent version. The scheme exploits the fact that speech is a quasi-stationary signal and simplifies the recognition process based on this theory. The modeling allows the speaker profile to be updated progressively with the new speech sample that is acquired during usage time.
Abstract:
Methods and systems of text independent speaker recognition provide a complexity comparable to text dependent speaker recognition system. These methods and systems exploit the fact that speech is a quasi-stationary signal and simplify the recognition process based on this theory. The speaker modeling allows a speaker profile to be updated progressively with new speech samples that are acquired during usage over time by the speaker.
Abstract:
Embodiments of the present disclosure are directed to techniques for adjusting the amplitude of a digital audio signal in the frequency domain to control the perceived loudness of the audio signal at a desired level. In one embodiment, a method first adjusts the audio signal to a desired loudness level by applying an adaptive wideband gain and thereafter a multi-band compression is applied to further reduce a dynamic range of the audio signal, and noise analysis and temporal masking operations are also performed to provide a pleasant sound for a listener or listeners.
Abstract:
In an embodiment, an apparatus includes a determiner, converter, adapter, and modifier. The determiner is configured to generate a representation of a difference between a first frequency at which a first signal is sampled and a second frequency at which a second signal is sampled, and the converter is configured to generate a second sample of the first signal at a second time in response to the representation and a first sample of the first signal at a first time. The adapter is configured to generate a sample of a modifier signal in response to the second sample of the first signal, and the modifier is configured to generate a modified sample of the second signal in response to a sample of the second signal and the sample of the modifier signal. For example, such an apparatus may be able to reduce the magnitude of an echo signal in a system having an audio pickup (e.g., a microphone) near an audio output (e.g., a speaker).
Abstract:
Embodiments of the present disclosure are directed to techniques for adjusting the amplitude of a digital audio signal in the frequency domain to control the perceived loudness of the audio signal at a desired level. In one embodiment, a method first adjusts the audio signal to a desired loudness level by applying an adaptive wideband gain and thereafter a multi-band compression is applied to further reduce a dynamic range of the audio signal, and noise analysis and temporal masking operations are also performed to provide a pleasant sound for a listener or listeners.
Abstract:
Embodiments reduce the complexity of speaker dependent speech recognition systems and methods by representing the code phrase (i.e., the word or words to be recognized) using a single Gaussian Mixture Model (GMM) which is adapted from a Universal Background Model (UBM). Only the parameters of the GMM need to be stored. Further reduction in computation is achieved by only checking the GMM component that is relevant to the keyword template. In this scheme, keyword template is represented by a sequence of the index of best performing component of the GMM of the keyword model. Only one template is saved by combining the registration template using Longest Common Sequence algorithm. The quality of the word model is continuously updated by performing expectation maximization iteration using the test word which is accepted as keyword model.
Abstract:
A method of estimating a steering vector of a sensor array of M sensors according to one embodiment of the present disclosure includes estimating a steering vector of a noise source located at an angle 0 degrees from a look direction of the array using a least squares estimate of the gains of the sensors in the array, defining a steering vector of a desired sound source in the look direction of the array, and estimating the steering vector by performing element-by-element multiplication of the estimated noise vector and the complex conjugate of steering vector of the desired sound source. The sensors may be microphones.
Abstract:
Embodiments reduce the complexity of speaker dependent speech recognition systems and methods by representing the code word (i.e., the word to be recognized) using a single Gaussian Mixture Model (GMM) which is adapted from a Universal Background Model (UBM). Only the parameters of the GMM need to be stored. Further reduction in computation is achieved by only checking the GMM component that is relevant to the keyword template. In this scheme, keyword template is represented by a sequence of the index of best performing component of the GMM of the keyword model. Only one template is saved by combining the registration template using Longest Common Sequence algorithm. The quality of the word model is continuously updated by performing expectation maximization iteration using the test word which is accepted as keyword model.
Abstract:
A method of estimating a steering vector of a sensor array of M sensors according to one embodiment of the present disclosure includes estimating a steering vector of a noise source located at an angle θ degrees from a look direction of the array using a least squares estimate of the gains of the sensors in the array, defining a steering vector of a desired sound source in the look direction of the array, and estimating the steering vector by performing element-by-element multiplication of the estimated noise vector and the complex conjugate of steering vector of the desired sound source. The sensors may be microphones.
Abstract:
The present invention is a system and method for digital watermarking, which discloses a system for digital watermarking, to add a watermark to an audio signal generated by a signal source. The system comprises: a spectrum modulator configured to perform spectrum modulation to a watermark bit and a pseudo noise signal to be embedded into the audio signal to generate a modulated signal; a distortion controller coupled to the signal source and the spectrum modulator and configured to shape the modulated signal based on the audio signal, so as to generate a shaped signal satisfying a predetermined distortion constraint; and an interference compensator coupled to the signal source and the distortion controller and configured to generate a compensation signal based on the audio signal, the pseudo noise signal, and the shaped signal, wherein the compensation signal is for compensating for interference to watermark decoding caused by the audio signal.