Abstract:
A process provides for searching through a written text in response to a spoken question comprising a plurality of words. The first step in the process is to transcribe the written text into a first sequence of phonetic units. Then, a spoken question is segmented into a second sequence of phonetic units. The search is conducted through the written text for an occurrence of the spoken question. The search comprises aligning the first and second sequences of phonetic units.
Abstract:
Input data are translated into a lexical output sequence. Sub-lexical entities and various possible combinations of the entities are identified as states ei and ej of first and second language models, respectively, intended to be stored, with an associated likelihood value and a table having memory areas. Each memory area is intended to contact at least one combination of the states and has an address equal to a value h [(ei:ej)] of a scalar function h applied to parameters peculiar to the combination (ei:ej). There is reduced complexity of accesses to information produced by a single transducer formed by a single Viterbi machine using the models.
Abstract:
A method of synchronizing an operation for processing, by an automatic speech recognition system of a device, a voice sequence uttered by a speaker and an action of the speaker intended to trigger the processing by the device. The processing operation is effected by the device from a given time preceding the action of the speaker. A time interval between the given time and the action of the speaker corresponds to a given interval.
Abstract:
Method of synchronization between an operation for processing, by automatic speech recognition, a voice sequence (Sv) uttered by a speaker and an action of said speaker intended to trigger said processing. According to the invention, said processing operation is effected from a given time (t0) preceding said action of the speaker. Application to automatic speech recognition.
Abstract:
A speech recognition method including for a spoken expression: a) providing a vocabulary of words including predetermined subsets of words, b) assigning to each word of at least one subset an individual score as a function of the value of a criterion of the acoustic resemblance of that word to a portion of the spoken expression, c) for a plurality of subsets, assigning to each subset of the plurality of subsets a composite score corresponding to a sum of the individual scores of the words of said subset, d) determining at least one preferred subset having the highest composite score.
Abstract:
A method of transmitting end-of-speech marks in a distributed speech recognition system operating in a discontinuous transmission mode, in which system speech segments (30, 40) are transmitted, followed by periods (34) of silence, each speech segment (30, 40) terminating with an end-of-speech mark (31, 41). The end-of-speech mark (31) is retransmitted continually (31a, 31b, 31c, 31d) throughout the duration of the period of silence (34) following said speech segment (30).
Abstract:
Input data are translated into at least one output sentence by a decoding step which sub-lexical entities represented by the input data are identified by a first model. During decoding, as the sub-lexical entities are identified and with reference to at least one second mode, various possible combinations of the sub-lexical entities are generated. Plural possible combinations of the sub-lexical entities are stored. The most likely possible combination is intended to form the lexical output sequence.