Representing n-gram language models for compact storage and fast retrieval

发明授权

US08175878B1 Representing n-gram language models for compact storage and fast retrieval 有权

标题翻译：代表用于紧凑存储和快速检索的n-gram语言模型

请登陆查看更多内容

专利标题： Representing n-gram language models for compact storage and fast retrieval
专利标题（中）： 代表用于紧凑存储和快速检索的n-gram语言模型
申请号： US12968108

申请日： 2010-12-14
公开(公告)号： US08175878B1

公开(公告)日： 2012-05-08
发明人: Ciprian Chelba , Thorsten Brants
申请人： Ciprian Chelba , Thorsten Brants
申请人地址： US CA Mountain View
专利权人： Google Inc.
当前专利权人： Google Inc.
当前专利权人地址： US CA Mountain View
代理机构： Harness, Dickey & Pierce, P.L.C.
主分类号： G10L15/18
IPC分类号： G10L15/18 ; G10L15/06 ; G06F17/27

Representing n-gram language models for compact storage and fast retrieval

摘要：

Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.

摘要（中）：

提供了用于表示语言模型的系统，方法和装置，包括计算机程序产品。在一些实现中，提供了计算机实现的方法。该方法包括生成紧凑语言模型，包括从语料库接收n-gram的集合，每个n-gram的集合具有在语料库中发生的对应的第一概率，并且生成代表n-gram的集合的特里。该方法还包括使用语言模型来识别发生的特定字符串字符串的第二概率。

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/18	..利用自然语言模型