-
公开(公告)号:US11328121B2
公开(公告)日:2022-05-10
申请号:US15670246
申请日:2017-08-07
IPC分类号: G06F17/28 , G06F40/216 , G06F40/40 , G06F40/10 , G06F40/205 , G06F40/242 , G06F40/279 , G10L15/06
摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.
-
公开(公告)号:US09741338B2
公开(公告)日:2017-08-22
申请号:US14963479
申请日:2015-12-09
发明人: Srinivas Bangalore
IPC分类号: G10L17/22 , G10L15/07 , G10L15/18 , G10L15/197 , G10L25/84 , G10L15/20 , G10L15/25 , G10L15/22
CPC分类号: G10L15/075 , G10L15/1822 , G10L15/197 , G10L15/20 , G10L15/25 , G10L17/22 , G10L25/84 , G10L2015/223 , G10L2015/228
摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context.
-
公开(公告)号:US09721558B2
公开(公告)日:2017-08-01
申请号:US14965251
申请日:2015-12-10
发明人: Srinivas Bangalore , Junlan Feng , Mazin Gilbert , Juergen Schroeter , Ann K. Syrdal , David Schulz
IPC分类号: G10L13/06 , G10L13/08 , G10L13/033 , G10L13/02 , G10L15/197 , G10L13/00
CPC分类号: G10L13/033 , G10L13/00 , G10L13/02 , G10L13/06 , G10L13/08 , G10L15/197
摘要: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
-
公开(公告)号:US20200058318A9
公开(公告)日:2020-02-20
申请号:US15911678
申请日:2018-03-05
IPC分类号: G10L25/51 , G10L15/05 , G10L17/04 , G10L15/30 , G10L15/19 , G06F3/16 , G10L15/18 , G10L15/07 , G10L15/22 , G10L15/183
摘要: Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
-
公开(公告)号:US10546595B2
公开(公告)日:2020-01-28
申请号:US15911678
申请日:2018-03-05
IPC分类号: G10L15/00 , G10L25/51 , G10L15/19 , G10L17/04 , G10L15/18 , G06F3/16 , G10L15/05 , G10L15/07 , G10L15/30 , G10L15/183 , G10L15/22
摘要: Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
-
公开(公告)号:US20180197566A1
公开(公告)日:2018-07-12
申请号:US15911678
申请日:2018-03-05
IPC分类号: G10L25/51 , G10L15/05 , G10L17/04 , G10L15/30 , G10L15/19 , G06F3/16 , G10L15/18 , G10L15/07 , G10L15/22 , G10L15/183
CPC分类号: G10L25/51 , G06F3/162 , G10L15/05 , G10L15/07 , G10L15/18 , G10L15/183 , G10L15/19 , G10L15/30 , G10L17/04 , G10L2015/228
摘要: Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.
-
7.
公开(公告)号:US09792904B2
公开(公告)日:2017-10-17
申请号:US14338602
申请日:2014-07-23
IPC分类号: G06F17/20 , G06F17/27 , G06F17/21 , G10L15/28 , G10L15/18 , G10L15/14 , G10L15/183 , G10L15/19 , G06F17/28
CPC分类号: G10L15/183 , G06F17/2818 , G10L15/14 , G10L15/19
摘要: Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.
-
公开(公告)号:US10726833B2
公开(公告)日:2020-07-28
申请号:US15985107
申请日:2018-05-21
发明人: Srinivas Bangalore , Robert Bell , Diamantino Antonio Caseiro , Mazin Gilbert , Patrick Haffner
IPC分类号: G10L15/183 , G10L15/06 , G10L15/22 , G10L15/065 , G10L15/30
摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.
-
公开(公告)号:US10403290B2
公开(公告)日:2019-09-03
申请号:US15681644
申请日:2017-08-21
发明人: Srinivas Bangalore
IPC分类号: G10L17/22 , G10L15/18 , G10L15/197 , G10L25/84 , G10L15/07 , G10L15/20 , G10L15/25 , G10L15/22
摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context.
-
公开(公告)号:US09978363B2
公开(公告)日:2018-05-22
申请号:US15620461
申请日:2017-06-12
发明人: Srinivas Bangalore , Robert Bell , Diamantino Antonio Caseiro , Mazin Gilbert , Patrick Haffner
IPC分类号: G10L15/183 , G10L15/22 , G10L15/30 , G10L15/065 , G10L15/06
CPC分类号: G10L15/183 , G10L15/06 , G10L15/065 , G10L15/22 , G10L15/30 , G10L2015/0635 , G10L2015/0636 , G10L2015/228
摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.
-
-
-
-
-
-
-
-
-