ACCURACY IMPROVEMENT OF SPOKEN QUERIES TRANSCRIPTION USING CO-OCCURRENCE INFORMATION
    2.
    发明申请
    ACCURACY IMPROVEMENT OF SPOKEN QUERIES TRANSCRIPTION USING CO-OCCURRENCE INFORMATION 有权
    使用合作信息进行翻译查询的准确性改进

    公开(公告)号:US20140136197A1

    公开(公告)日:2014-05-15

    申请号:US14156788

    申请日:2014-01-16

    IPC分类号: G10L15/08 G10L15/26

    摘要: Techniques disclosed herein include systems and methods for voice-enabled searching. Techniques include a co-occurrence based approach to improve accuracy of the 1-best hypothesis for non-phrase voice queries, as well as for phrased voice queries. A co-occurrence model is used in addition to a statistical natural language model and acoustic model to recognize spoken queries, such as spoken queries for searching a search engine. Given an utterance and an associated list of automated speech recognition n-best hypotheses, the system rescores the different hypotheses using co-occurrence information. For each hypothesis, the system estimates a frequency of co-occurrence within web documents. Combined scores from a speech recognizer and a co-occurrence engine can be combined to select a best hypothesis with a lower word error rate.

    摘要翻译: 本文公开的技术包括用于支持语音的搜索的系统和方法。 技术包括基于共现的方法,以提高非短语语音查询的1最佳假设的准确性,以及用于短语语音查询。 使用统计自然语言模型和声学模型来识别口语查询(例如用于搜索搜索引擎的口语查询)的共现模型。 给定一个话语和相关的自动语音识别n最佳假设列表,系统使用同现信息重新分辨不同的假设。 对于每个假设,系统估计网络文档中共现的频率。 来自语音识别器和共现引擎的组合分数可以组合以选择具有较低字错误率的最佳假设。

    Topic specific language models built from large numbers of documents
    3.
    发明授权
    Topic specific language models built from large numbers of documents 失效
    由大量文档构建的主题特定语言模型

    公开(公告)号:US07739286B2

    公开(公告)日:2010-06-15

    申请号:US11384226

    申请日:2006-03-17

    摘要: Forming and/or improving a language model based on data from a large collection of documents, such as web data. The collection of documents is queried using queries that are formed from the language model. The language model is subsequently improved using the information thus obtained. The improvement is used to improve the query. As data is received from the collection of documents, it is compared to a rejection model, that models what rejected documents typically look like. Any document that meets the test is then rejected. The documents that remain are characterized to determine whether they add information to the language model, whether they are relevant, and whether they should be independently rejected. Rejected documents are used to update the rejection model; accepted documents are used to update the language model. Each iteration improves the language model, and the documents may be analyzed again using the improved language model.

    摘要翻译: 基于大量文档集合(如Web数据)的数据形成和/或改进语言模型。 使用由语言模型形成的查询来查询文档的集合。 随后使用所获得的信息改进语言模型。 改进用于改进查询。 随着从文档收集中收到数据,将其与拒绝模型进行比较,拒绝模型会对被拒绝的文档进行模型化。 任何符合测试的文件都被拒绝。 遗留的文件的特点是确定他们是否将信息添加到语言模型中,它们是否相关,以及是否应该被独立地拒绝。 被拒绝的文件用于更新拒绝模式; 接受的文件用于更新语言模型。 每次迭代改进语言模型,并且可以使用改进的语言模型再次分析文档。

    Topic specific language models built from large numbers of documents
    4.
    发明申请
    Topic specific language models built from large numbers of documents 失效
    由大量文档构建的主题特定语言模型

    公开(公告)号:US20060212288A1

    公开(公告)日:2006-09-21

    申请号:US11384226

    申请日:2006-03-17

    IPC分类号: G06F17/21

    摘要: Forming and/or improving a language model based on data from a large collection of documents, such as web data. The collection of documents is queried using queries that are formed from the language model. The language model is subsequently improved using the information thus obtained. The improvement is used to improve the query. As data is received from the collection of documents, it is compared to a rejection model, that models what rejected documents typically look like. Any document that meets the test is then rejected. The documents that remain are characterized to determine whether they add information to the language model, whether they are relevant, and whether they should be independently rejected. Rejected documents are used to update the rejection model; accepted documents are used to update the language model. Each iteration improves the language model, and the documents may be analyzed again using the improved language model.

    摘要翻译: 基于大量文档集合(如Web数据)的数据形成和/或改进语言模型。 使用由语言模型形成的查询来查询文档的集合。 随后使用所获得的信息改进语言模型。 改进用于改进查询。 随着从文档收集中收到数据,将其与拒绝模型进行比较,拒绝模型会对被拒绝的文档进行模型化。 任何符合测试的文件都被拒绝。 遗留的文件的特点是确定他们是否将信息添加到语言模型中,它们是否相关,以及是否应该被独立地拒绝。 被拒绝的文件用于更新拒绝模式; 接受的文件用于更新语言模型。 每次迭代改进语言模型,并且可以使用改进的语言模型再次分析文档。