Abstract:
A contact center includes an outbound server to make a call to a callee and a media device. The media device receives an audio signal based on the call, to determine a Mel-frequency cepstral coefficient for the received audio signal, and to match the Mel-frequency cepstral coefficient for the audio signal to a Mel-frequency cepstral coefficient for a pre-recorded carrier message. The media device can determine a content of the audio signal based on the match.
Abstract:
Dynamic Time Warping (DTW Matching) is performed. Calculations indicating the frequency that each template vector (On Word Template) comparison results in a minimum local distance or Distribution of Minimal Distance Index (DMDI) are produced. Distance scores are then re-calculated (Rescoring II Based on DMDI).
Abstract translation:执行动态时间扭曲(DTW匹配)。 指示每个模板向量(On Word Template)比较导致最小局部距离或最小距离指数(DMDI)分布的频率的计算。 然后重新计算距离分数(基于DMDI的评分II)。
Abstract:
The invention relates to pre-processing of a pronunciation dictionary for compression in a data processing device, the pronunciation dictionary comprising at least one entry, the entry comprising a sequence of character units and a sequence of phoneme units. According to one aspect of the invention the sequence of character units and the sequence of phoneme units are aligned using a statistical algorithm. The aligned sequence of character units and aligned sequence of phoneme units are interleaved by inserting each phoneme unit at a predetermined location relative to the corresponding character unit.
Abstract:
The invention belongs to the technical domain of decoding, classification, alignment and matching of data. The invention refers to new methods of keyword spotting in utterances, detection of subsequences in chains of organic matter (DNA) and recognition of objects in images. The proposed methods search in an optimized way the matching that maximizes, over all the possible matchings, certain confidence measures based on normalized posteriors. Three such confidence measures are used, two are inspired from anterior work in Speech Recognition, and the third one is a new one. Application fields for this invention are: man-machine interfaces (using speech recognition; ex: control systems, banking, flight services, etc.), coordination systems (for industrial robots and automata) and development systems for pharmaceutic products.
Abstract:
A continuous speech analyzer (Fig. 1) is adapted to recognize an utterance (101) as a series string of reference words (130) for which acoustic feature signals are stored (105). Responsive to the utterance (103) and reference word acoustic features (105), at least one reference word series is generated as a candidate for the utterance. Successive word positions for the utterance are identified. In each word position, partial candidate series are generated by a dynamic time WARP partitioning circuit (110) determining a distance signal reference corresponding to a prescribed similarity of utterance segment intervals and reference template involving a partial candidate series of the preceding word position. The candidate utterance segments (130) have beginning points within a predetermined range of the utterance position endpoint for the preceding word position candidate series to account for coarticulation and differences between acoustic features of the utterance and those for reference words (105) spoken in isolation. A minimum distance signal (170) selected from a plurality of partial candidates identifies the candidate string closest to the utterance.
Abstract:
An automatic speech recognition system for recognizing a user (2) voice command in noisy environment, comprising -matching means for matching elements retrieved from speech units forming said command with templates in a template library (44); characterized by -processing means (32, 36, 38) including a MultiLayer Perceptron (38) for computing posterior templates (P(O template(q) )) stored as said templates in said template library (44); -means for retrieving posterior vectors (P(O test(q) )) from said speech units, said posterior vectors being used as said elements. The present invention relates also to a method for recognizing a user voice command in noisy environments.
Abstract:
A method is presented including selecting an initial beam width. The method also includes determining whether a value per frame is changing. A beam width is dynamically adjusted. The method further decides a speech input with the dynamically adjusted beam width. Also, a device is presented including a processor (420). A speech recognition component (610) is connected to the processor (420). A memory (410) is connected to the processor (420). The speech recognition component (610) dynamically adjusts a beam width to decode a speech input.
Abstract:
Speech recognition uses a wide token builder (66), gain and noise adapter (70) and noise adapted Dynamic Time Warping (60). Wide token builder produces a padded test token expanded with at least one blank frame before and after the input test utterance. Gain and noise adapter adapts each padded reference template with noise and gain qualities producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. Dynamic Time Warping (DTW) is performed on the noise adapted templates.