发明授权
- 专利标题: Document compression system and method for use with tokenspace repository
- 专利标题(中): 文档压缩系统和方法用于托管存储库
-
申请号: US10917739申请日: 2004-08-13
-
公开(公告)号: US07917480B2公开(公告)日: 2011-03-29
- 发明人: Jeffrey Dean , Gautham K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu
- 申请人: Jeffrey Dean , Gautham K. Thambidorai , Sanjay Ghemawat , Benedict Anthony Gomes , Olcan Sercinoglu
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Morgan, Lewis & Bockius LLP
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/00 ; G06F15/18
摘要:
The disclosed embodiments enable multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. The mapping scheme includes a first mapping between unique tokens contained in a set of documents and unique global token identifiers (e.g., 32-bit integers) contained in a global-lexicon (i.e., dictionary). The mapping scheme also includes a second mapping between the global token identifiers and a set of fixed-length local token identifiers (e.g., 8-bit integers) contained in one or more mini-lexicons (i.e., sub-dictionaries). Each mini-lexicon is associated with a range of token positions in the tokenized documents. The first and second mappings are used to encode/decode documents into local token identifiers having fixed widths which can be compactly stored in the tokenspace repository. The use of fixed-length local token identifiers allows for fast and efficient decoding of tokenized documents.
公开/授权文献
信息查询