摘要:
A method and apparatus are disclosed for robust, text-independent (and text-dependent) speaker recognition in which identification of a speaker is based on selected spectral information from the speaker's voice. Traditionally, speaker recognition systems (i) render a speech sample in the frequency domain to produce a spectrum, (ii) produce cepstrum coefficients from the spectrum, (iii) produce a codebook from the cepstrum coefficients, and (iv) use the codebook as the feature measure for comparing training speech samples with testing speech samples. The present invention, on the other hand, introduces the important and previously unknown step of truncating the spectrum prior to producing the cepstrum coefficients. Through the use of selected spectra as the feature measure for speaker recognition, the present invention has been shown to yield significant improvements in performance over prior art systems.
摘要:
A neural network is trained to transform distant-talking cepstrum coefficients, derived from a microphone array receiving speech from a speaker distant therefrom, into a form substantially similar to close-talking cepstrum coefficients that would be derived from a microphone close to the speaker, for providing robust hands-free speech and speaker recognition in adverse practical environments with existing speech and speaker recognition systems which have been trained on close-talking speech.
摘要:
Simultaneous and temporal masking of digital speech data is applied to an MBE-based speech coding technique to achieve additional, substantial compression of coded speech over existing coding techniques, while enabling synthesis of coded speech with minimal perceptual degradation relative to the human auditory system. A real-time perceptual coder and decoder is disclosed in which speech may be sampled at 10 kHz, coded at an average rate of less than 2 bits/sample, and reproduced in a manner that is perceptually transparent to a human listener. The coder compresses speech segments that are inaudible due to simultaneous or temporal masking, while audible speech segments are not compressed.
摘要:
A directional acoustic transducer includes a plurality of acoustic paths each having first and second ends. The second end of each path terminates in the atmosphere. An electroacoustic device is attached to an acoustic cavity and the acoustic path first ends are coupled to the said acoustic cavity through an acoustic arrangement adapted to produce a predetermined transducer directional response pattern.
摘要:
A microphone arrangement focuses on a prescribed volume in a large room such as an auditorium. The arrangement includes a plurality of directable beam microphone structures. Each beam is directed to a prescribed location. The signals produced in the microphone structures are selectively adjusted to accept sounds from a predetermined volume surrounding the location and to reject sounds outside the prescribed volume.
摘要:
An electroacoustic transducer assembly adapted to filter sound waves in a digital communication system incorporates a plurality of tandemly arranged tubular members and a transducer. Each tubular member includes an apertured plate end, a tubular cavity and an open end. The open end of each tubular member is secured to the plate end of the adjacent tubular member to form a housing with a divided longitudinal passageway. The open end of the housing is secured to the transducer. Every tubular cavity is partitioned into longitudinal sections by structural elements to inhibit cross mode resonance. The apertures, cavity lengths and structural elements are dimensioned relative to the cavity cross sections to suppress passage of sound waves outside a predetermined frequency band.
摘要:
In an ADPCM system, improved detection of silence intervals in a speech signal is attained by detecting the level of the logarithm step-size signal (d.sub.n), which is representative of the energy of the speech samples. A speech pattern is converted into a sequence of adaptive digital codes. Intervals of silence in the pattern are detected and a digital code representative of each silence interval is generated. The adaptive digital codes and the silence interval codes are combined to form a digitally coded signal representative of the pattern. The conversion of the pattern to adaptive digital codes includes forming a signal corresponding to the adaptation step-size for each digital code. The silence interval detection includes producing first and second threshold signals. A silence interval signal is initiated when the adaptation step-size corresponding signal diminishes below the first threshold and the silence interval is terminated when the adaptation step-size corresponding signal increases above the second threshold after the silence interval initiation.
摘要:
In a speech communication system, an input speech signal is partitioned into a plurality of subband portions. Responsive to each subband portion, a signal of lesser bandwidth representative of the subband portion is generated by dividing the instantaneous phase of the subband by an integer k. Where k=2, for example, the center frequency and bandwidth of each subband is halved. The lesser bandwidth subband portion representative signals are combined to form a compressed bandwidth signal representative of the input speech signal. A replica of the input speech signal is formed by partitioning the compressed bandwidth signal into subband portions thereof; converting each compressed signal subband portion into a signal representative of a subband of the input speech signal; and combining the converted subband representative signals into a single speech signal replica.
摘要:
A signal processing arrangement is connected to a microphone array to form at least one directable beam sound receiver. The directable beam sound receivers are adapted to receive sounds from predetermined locations in a prescribed environment such as auditorium. Signals representative of prescribed sound features received from the plurality of predetermined locations are generated and one or more of the locations is selected responsive to the sound feature signals. A plurality of directable beam sound receivers may be used to concurrently analyze sound features from the predetermined locations. Alternatively, one directable beam sound receiver may be used to scan the predetermined locations so that the sound feature signals therefrom are compared to sound features from a currently selected location.
摘要:
An electroacoustic device comprises an array of electroacoustic transducer elements for producing a prescribed directional response pattern at a first frequency. Each element includes apparatus for restricting the frequency range of sound waves incident on said element so that the directional response pattern is invariant over a prescribed frequency band.