摘要:
In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
摘要:
In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
摘要:
In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
摘要:
Methods and arrangements for facilitating database access in speech recognition. A plurality of possible subsequences corresponding to a database entry are ascertained, a record of such subsequences and their correspondence to database entries is created, and either or both of the following are carried out: unique signatures are ascertained via determining whether a subsequence corresponding to a given database entry does not also correspond to at least one other database entry; and/or multiple occurrences of a given subsequence are found, with corresponding database entries being grouped into a confusion set.
摘要:
A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.
摘要:
A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.
摘要:
A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.
摘要:
A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.
摘要:
A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.
摘要:
A voice processing system is provided in which sets of engines running on a plurality of servers are configured differently from one another. The sets of engines may be configured to achieve different trade-offs between performance of a task and resources required to perform the task. In the voice processing system, a task routing server is provided that assigns different sets of sub-tasks to different sets of task engines. The number of engines used to perform a task and the number of engines in each set are adjusted. By adjusting the parameters settings for the set of engines based on the type of application, the particular requirements of the application, or the nature and importance of the subtasks, for example, advantages such as improvement of resource utilization and the hardware and software costs reduction may be obtained.