Abstract:
A headset system is proposed including a headset unit to be worn by a user and having two or more microphones, and a base unit in wireless communication with the headset. Signals received from the microphones are processed using a first adaptive filter to enhance a target signal, and then divided and supplied to a second adaptive filter arranged to reduce interference signals and a third filter arranged to reduce noise. The outputs of the second and third filters are combined, and are be subject to further processing in the frequency domain. The results are transmitted to a speech recognition engine.
Abstract:
A method and apparatus for coding successive pitch periods (Fig. 5) of a speech signal. Based on a priori knowledge of statistical properties of successive speech periods, a shaped lattice structure is designed to cover the most probable points in the pitch space. The codebook index search starts with finding an open-loop estimate in the pitch space considering all dimensions and refining the open-loop estimate in a closed-loop search separately in each dimension based on the shaped lattice structure. The closed-loop search for the first subframe is for obtaining an absolute pitch period or a delta pitch while the closed-loop search for each of the other subframes is for obtaining a delta pitch for the respective subframe.
Abstract:
A method is provided for positioning the individual elements of a microphone arrangement including at least two such elements. The spacing among the microphone elements supports the generation of numerous combinations of the signal of interest and a sum of interfering sources. Use of the microphone element placement method leads to the formation of many types of microphone arrangements, comprising at least two microphone elements, and provides the input data to signal processing system for sound discrimination. Many examples of these microphone arrangements are provided, some of which are integrated with everyday objects. Also, enhancements and extensions are provided for a signal separation-based processing system for sound discrimination, which uses the microphone arrangements as the sensory front end.
Abstract:
A method of interpretation of features for signal processing and pattern recognition provides a model in which the pattern or signal to be interpreted is considered as a set of N observations, M of which are corrupt, and a disjunction is performed over all possible combinations of N different values (1,...,N) taken N-M at a time. The value of M defines the order of the model, and is determined using an optimality criterion which chooses the order that corresponds to a clean signal based on comparing the state duration probability of the signal or pattern to be interpreted with that of a clean signal.
Abstract:
A system and method to identify a sound source among a group of sound sources. The invention matches the acoustic input to a number of signal models, one per source class, and produces a goodness-of-match number for each signal model. The sound source is declared to be of the same class as that of the signal model with the best goodness-of-match if that score is sufficiently high. The data are recorded with a microphone, digitized and transformed into the frequency domain. A signal detector is applied to the transient. A harmonic detection method can be used to determine if the sound source has harmonic characteristics. If at least some part of a transient contains signal of interest, the spectrum of the signal after rescaling is compared to a set of signal models, and the input signal's parameters are fitted to the data. The average distortion is calculated to compare patterns with those of sources that used in training the signal models. Before classification can occur, a source model is trained with signal data. Each signal model is built by creating templates from input signal spectrograms when they are significantly different from existing templates. If an existing template is found that resembles the input pattern, the template is averaged with the pattern in such a way that the resulting template is the average of all the spectra that matched that template in the past.
Abstract:
The present invention discloses a computer (410) implemented signal analysis method through the Hilbert-Huang Transformation "HHT" for analyzing acoustical signals (10), which are assumed to be nonlinear and nonstationary. The Empirical Decomposition Method "EMD" and the Hilbert Spectral Analysis "HSA" are used to obtain the HHT. Essentially, the acoustical signal will be decomposed into the Intrinsic Mode Function Components "IMFs". Once the invention decomposes the acoustic signal into its constituting components, all operations such as analyzing, identifying, and removing unwanted signals can be performed on these components. Upon transforming the IMFs into Hilbert spectrum, the acoustical signal may be compared with other acoustical signals.
Abstract:
A speech classification technique (502-530) for robust classification of varying modes of speech to enable maximum performance of multi-mode variable bit rate encoding techniques. A speech classifier accurately classifies a high percentage of speech segments for encoding at minimal bit rates, meeting lower bit rate requirements. Highly accurate speech classification produces a lower average encoded bit rate, and higher quality decoded speech. The speech classifier considers a maximum number of parameters for each frame of speech, producing numerous and accurate speech mode classifications for each frame. The speech classifier correctly classifies numerous modes of speech under varying environmental conditions. The speech classifier inputs classification parameters from external components, generates internal classification parameters from the input parameters, sets a Normalized Auto-correlation Coefficient Function threshold and selects a parameter analyzer according to the signal environment, and then analyzes the parameters to produce a speech mode classification.
Abstract:
A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproductible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
Abstract:
The invention provides a method of speech recognition comprising the steps of receiving a signal comprising one or more spoken words, extracting a spoken word from the signal using a Hidden Markov Model, passing the spoken word to a plurality of word models, one or more of the word models based on a Hidden Markov Model, determining the word model most likely to represent the spoken word, and outputting the word model representing the spoken word. The invention also provides a related speech recognition system and a speech recognition computer program.
Abstract:
A basilar membrane model is used to receive an input signal including a target signal in step I. With successive further steps the target signal is filtered from the input signal. After the filtering the target signal can be used as an input for further processing, like for example signal recognition of data compression. The target signal can also be applied to a substantially reverse method to obtain an improved or clean signal.