发明申请
- 专利标题: EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE
- 专利标题(中): 基于高效数据编码的大规模数据存储
-
申请号: US12270873申请日: 2008-11-14
-
公开(公告)号: US20100030796A1公开(公告)日: 2010-02-04
- 发明人: Amir Netz , Cristian Petculescu , Ioan Bogdan Crivat
- 申请人: Amir Netz , Cristian Petculescu , Ioan Bogdan Crivat
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F17/00
- IPC分类号: G06F17/00
摘要:
The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.
公开/授权文献
信息查询