摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.
摘要:
A system (30) and method (80) for providing an automated call center inline architecture is provided. A plurality of grammar references (65) and prompts are maintained on a script engine (31). A call is received through a telephony interface (32). Audio data (39) is collected using the prompts from the script engine (31), which are transmitted to the telephony interface (32) via a message server (34). Distributed speech recognition (88) is performed on a speech server (33). The grammar references (65) are received from the script engine (31) via the message server (34). Speech results (69) are determined by applying the grammar references (65) to the audio data (39). A new grammar (70) is formed from the speech results (69). Speech recognition results (71) are identified by applying the new grammar (70) to the audio data (39). The speech recognition results (71) are received as a display on an agent console (35).
摘要:
Method and system for real-time speech recognition is provided. The speech algorithm runs on a platform having an input-output processor and a plurality of processor units. The processor units operate substantially in parallel or sequentially to perform feature extraction and pattern matching. While the input-output processor creates a frame, the processor units execute the feature extraction and the pattern matching. Shared memory is provided for supporting the parallel operation.
摘要:
A system (30) and method (80) for providing an automated call center inline architecture is provided. A plurality of grammar references (65) and prompts are maintained on a script engine (31). A call is received through a telephony interface (32). Audio data (39) is collected using the prompts from the script engine (31), which are transmitted to the telephony interface (32) via a message server (34). Distributed speech recognition (88) is performed on a speech server (33). The grammar references (65) are received from the script engine (31) via the message server (34). Speech results (69) are determined by applying the grammar references (65) to the audio data (39). A new grammar (70) is formed from the speech results (69). Speech recognition results (71) are identified by applying the new grammar (70) to the audio data (39). The speech recognition results (71) are received as a display on an agent console (35).
摘要:
A method is provided for improving pattern matching in a speech recognition system having a plurality of acoustic models (20). Similarity measures for acoustic feature vectors (54) are determined in groups that are then buffered into cache memory (59). To further reduce computational processing, the acoustic data may be partitioned amongst a plurality of processing nodes (66, 67, 68). In addition, a priori knowledge of the spoken order may be used to establish the access order (124) used to copy records from the main speech parameter table (120, 200) into a sub-table (130, 204). The sub-table is processed such that the entries are in contiguous memory locations (206) and sorted according to the processing order (208). The speech processing algorithm is then directed to operate upon the sub-table (210) which causes the processor to load the sub-table into high speed cache memory (104, 212).
摘要:
A speech recognition approach that involves forming a series of segments associated with a spoken utterance. Each segment has a time interval within the utterance, and scores characterizing the degree of match of the utterance in that time interval with a set of subword units. Based on the series of segments, the approach includes determining a set of word sequences hypotheses associated with the utterance and then computing scores for the set of word sequence hypotheses using a second set of subword units to represent words in the word sequence hypotheses.