发明授权
- 专利标题: Back-off language model compression
- 专利标题(中): 后退语言模型压缩
-
申请号: US12486358申请日: 2009-06-17
-
公开(公告)号: US08725509B1公开(公告)日: 2014-05-13
- 发明人: Boulos Harb , Ciprian Chelba , Jeffrey A. Dean , Sanjay Ghemawat
- 申请人: Boulos Harb , Ciprian Chelba , Jeffrey A. Dean , Sanjay Ghemawat
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Remarck Law Group PLC
- 主分类号: G10L15/00
- IPC分类号: G10L15/00 ; G10L15/06 ; G10L15/28 ; G06F17/21
摘要:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.