摘要:
Semantically distinct items are extracted from a single utterance by repeatedly recognizing the same utterance using constraints provided by semantic items already recognized. User feedback for selection or correction of partially recognized utterance may be used in a hierarchical, multi-modal, or single step manner. An accuracy of recognition is preserved while the less structured and more natural single utterance recognition form is allowed to be used.
摘要:
A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input.
摘要:
Techniques to provide automatic speech recognition at a local device are described. An apparatus may include an audio input to receive audio data indicating a task. The apparatus may further include a local recognizer component to receive the audio data, to pass the audio data to a remote recognizer while receiving the audio data, and to recognize speech from the audio data. The apparatus may further include a federation component operative to receive one or more recognition results from the local recognizer and/or the remote recognizer, and to federate a plurality of recognition results to produce a most likely result. The apparatus may further include an application to perform the task indicated by the most likely result. Other embodiments are described and claimed.
摘要:
Biasing of language model customization due to repetitious data is substantially reduced by introducing novelty screening to data harvesting process. Novelty detection based filtering is added to ensure that an adaptation system gives more weight to representative adaptation data that is not repetitious. The value of the adaptation data is preserved and the process prevented from being polluted when the same data is seen multiple times, such as the original posting in an email thread, various versions of the same document, and the like. The screening technique may be built on top of existing data harvesting mechanisms as already seen data is used to determine the novelty of a particular portion of the data. A window into the new data, fixed or variable size, is compared against the already collected data to determine the likelihood that the data is novel.
摘要:
A speech recognizer has an acoustic model that includes word-specific models, that are specific to candidate words. The candidate words would otherwise be mapped to a series of general phones. A decoder transcribes input speech into words formed by shared phones, word-specific phones and a combination of shared word-specific phones.
摘要:
An acoustic model includes word-specific models, that are specific to candidate words. The candidate words would otherwise be mapped to a series of general phones. A sub-series of the general phones representing the candidate word is modeled by a new phone and the new phone is dedicated to the candidate word, or a small group of similar words, but the new phone is not shared among all words that otherwise map to the sub-series of general phones.
摘要:
An expected dialog-turn (ED) value is estimated for evaluating a speech application. Parameters such as a confidence threshold setting can be adjusted based on the expected dialog-turn value. In a particular example, recognition results and corresponding confidence scores are used to estimate the expected dialog-turn value. The recognition results can be associated with a possible outcome for the speech application and a cost for the possible outcome can be used to estimate the expected dialog-turn value.
摘要:
A voice user interface authoring tool is configured to use categorized example caller responses, from which callflow paths, automatic speech recognition, and natural language processing control files can be generated automatically within a single, integrated authoring user interface. A voice user interface (VUI) design component allows an author to create an application incorporating various types of action nodes, including Prompt/Response Processing (PRP) nodes. At runtime, the system uses the information from each PRP node to prompt a user to say something, and to process the user's response in order to extract its meaning. An Automatic Speech Recognition/Natural Language Processing (ASR/NLP) Control Design component allows the author to associate sample inputs with each possible meaning, and automatically generates the necessary ASR and NLP runtime control files. The VUI design component allows the author to associate the appropriate ASR and NLP control files with each PRP node, and to associate an action node with each possible meaning, as indicated by the NLP control file.
摘要:
A method of analyzing dialog between a user and an interactive application having dialog turns is provided. The method includes accessing information indicative of a plurality of dialog turns between the application and at least one user and identifying instances where the application determined a response was received before an associated prompt had completed. The accessed information includes information related to operation of the application with a first grammar to recognize the response. The method includes identifying whether the response was received in a particular limited time period from when the associated prompt began. If the response was received in the limited time period, the method determines whether the response included one or more terms from the associated prompt by performing recognition on the response using a second grammar having more information related to grammar of a language than the first grammar.
摘要:
Techniques to provide automatic speech recognition at a local device are described. An apparatus may include an audio input to receive audio data indicating a task. The apparatus may further include a local recognizer component to receive the audio data, to pass the audio data to a remote recognizer while receiving the audio data, and to recognize speech from the audio data. The apparatus may further include a federation component operative to receive one or more recognition results from the local recognizer and/or the remote recognizer, and to federate a plurality of recognition results to produce a most likely result. The apparatus may further include an application to perform the task indicated by the most likely result. Other embodiments are described and claimed.