摘要:
Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
摘要:
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.
摘要:
A system for managing a data stream that is transmitted to an environment is provided. The system includes a receiver that receives the data stream. The data stream includes a first program, with the first program configured to be displayed in the environment. An input receives information of an individual in the environment. A processor analyzes the information, determines a demographic descriptor of the individual based on the information, and correlates the demographic descriptor of the individual with a content of the first program to determine whether a predetermined condition is satisfied. The processor further determines a second program based on the demographic descriptor of the individual and modifies the first program based on the second program when the predetermined condition is satisfied.
摘要:
Disclosed herein are systems, methods, and computer-readable storage media for an iterative disambiguation interface. A system practicing the method receives a search query formatted according to a standard XML markup language for containing and annotating interpretations of user input, the search query being based on a natural language spoken query from a user and retrieves search results based on the search query. The system transmits the search results to a user device and iteratively receives multimodal input from the user to change search attributes and transmits updated search results to the user device based on the changed search attributes. The search results can include a link to additional information, such as a video presentation, related to the search results. The standard XML markup language can be Extensible MultiModal Annotation (EMMA) markup language from W3C. The system can generate an iteration transaction history for each multimodal input and updated search result.
摘要:
Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
摘要:
Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.