Abstract:
The invention concerns a device comprising: a memory containing a series of numbers and vocal prints; an acoustic transducer, for picking up a correspondent's name spoken by the user; voice recognition means, for analysing the recorded correspondent's name and transforming it into a voice print; means for selectively addressing the memory, comprising associative means, for finding in the memory a voice print information corresponding to the one supplied by the voice recognition means and, if they match, for addressing the memory on the corresponding position; and means, co-operating with the associative means, for applying to the radiotelephone circuits the addressed directory number. The voice recognition means evaluate and memorise a current sound level picked up by the transducer in the absence of a word signal; in the presence of a word signal, they subtract from the picked up signal the previously evaluated current sound level and apply on the resulting signal a DTW voice recognition algorithm with form recognition by dynamic programming adapted to the word using functions for extracting dynamic parameters, in particular a dynamic predictive algorithm with forward and/or backward and/or frequency masking.
Abstract:
In a speech recognition arrangement (Fig. 1), a plurality of reference templates (130) are stored. Each template comprises a time frame sequence of feature signals of a prescribed reference pattern. A time frame sequence of feature signals representative of an unknown speech pattern is produced (115). Responsive to the feature signals of the speech pattern and the reference pattern templates, a set of signals representative of the similarity between the speech pattern and the reference templates is formed (135). The speech pattern is identified (170) as the one of the reference patterns responsive to the similarity signals. The similarity signal generation includes producing a plurality of signals for each frame of the speech pattern, each signal being representative to the correspondence of predetermined type speech pattern features and the same predetermined type features of the reference pattern. The similarity signal for the template is formed responsive to the plurality of predetermined type correspondence signals.
Abstract:
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include implementing frame skipping with approximated skip frames and/or distances on demand such that only those outputs needed by a speech decoder are provided via the neural network or approximation techniques.
Abstract:
녹취된 음성 데이터에 대한 핵심어 추출 기반 발화 내용 파악 시스템과, 이 시스템을 이용한 인덱싱 방법 및 발화 내용 파악 방법이 개시된다. 이 시스템의 인덱싱부는 음성 데이터를 입력받아서 프레임 단위로 음소 기준의 음성 인식을 수행하여 음소 격자를 형성하고, 복수의 프레임으로 구성되는 제한 시간의 프레임에 대해 분할된 인덱싱 정보-여기서 분할된 인덱싱 정보는 제한 시간의 프레임별로 형성되는 음소 격자를 포함함-를 생성하여 인덱싱 데이터베이스에 저장한다. 검색부는 사용자로부터 입력되는 핵심어를 검색어로 하여 인덱싱 데이터베이스에 저장된 분할된 인덱싱 정보에 대해 음소 기준의 비교를 통해 상기 검색어와 일치하는 음소열을 검색하고 일치하는 음소열에 대해 정밀한 음향학적 분석을 통해 검색어에 해당하는 음성부분을 찾아내고, 파악부는 상기 검색부에 의해 검색되는 검색 결과를 통해 주제어를 파악하여 상기 음성 데이터의 발화 내용을 파악할 수 있도록 사용자에게 출력한다.
Abstract:
심리 음향 모델에 따른 마스킹 임계치를 결정함에 있어서, 짧은 윈도우 기반의 오디오 신호에 대해서도 긴 윈도우 기반의 오디오 신호를 이용하는 경우와 마찬가지로 정확한 결과를 도출할 수 있는 오디오 신호 부호화 방법 및 장치가 제공된다. 본 발명에 따른 오디오 신호 부호화 장치는 오디오 신호가 분할된 제 1 윈도우의 프레임 길이에 기초하여, 제 1 윈도우와 프레임 길이가 상이한 제 2 윈도우에 대한 마스킹 임계치를 결정하는 마스킹 임계치 결정부를 포함한다.
Abstract:
Techniques for performing speech recognition in a communication device with a voice dialing function is provided. Upon receipt of a voice input in a speech recognition mode, input feature vectors are generated from the voice input. Also, a likelihood vector sequence is calculated from the input feature vectors indicating the likelihood in time of an utterance of phonetic units. In a warping operation, the likelihood vector sequence is compared to phonetic word models and word model match likelihoods are calculated for that word models. After determination of a best-matching word model, the corresponding number to the name synthesized from the best-matching word model is dialed in a dialing operation.
Abstract:
Jede lautsprachliche Einheit der ersten Folge wird auf eine entsprechende lautsprachliche Einheit der zweiten Folge abgebildet, wobei eine Vergleichs-Kostenfunktion erhöht wird, wenn bei der Abbildung einer Einfügung eine Auslassung oder eine Vertauschung einer lautsprachlichen Einheit erforderlich ist. Im Rahmen der Abbildung werden Artikulations-Merkmalsvektoren berücksichtigt derart, dass die Vergleichs-Kostenfunktion unterschiedlich erhöht wird bei lautsprachlichen Einheiten mit unterschiedlichen Artikulations-Merkmalsvektoren.
Abstract:
A voice recognition (VR) system is disclosed that utilizes a combination of speaker independent (SI) (230 and 232) and speaker dependent (SD) (234) acoustic models. At least one SI acoustic model (230 and 232) is used in combination with at least one SD acoustic model (234) to provide a level of speech recognition performance that at least equals that of a purely SI acoustic model. The disclosed hybrid SI/SD VR system continually uses unsupervised training to update the acoustic templates in the one ore more SD acoustic models (234). The hybrid VR system then uses the updated SD acoustic models (234) in combination with the at least one SI acoustic model (230 and 232) to provide improved VR performance during VR testing.
Abstract:
A method and system that combines voice recognition engines (104, 108, 112, 114) and resolves differences between the results of individual voice recognition engines (104, 106, 108, 112, 114) using a mapping function. Speaker independent voice recognition engine (104) and speaker-dependent voice recognition engine (106) are combined. Hidden Markov Model (HMM) engines (108, 114) and Dynamic Time Warping (DTW) engines (104, 106, 112) are combined.
Abstract:
A speech recognition algorithm is implemented in a computer program by sending speech input into a coder (2) and processing it in a standard computer (4) using reference patterns stored in memory (6). The algorithm uses the well-known technique called dynamic programming to include weighting and normalizing functions.