Efficient column based data encoding for large-scale data storage
    1.
    发明授权
    Efficient column based data encoding for large-scale data storage 有权
    高效的基于列的数据编码用于大规模数据存储

    公开(公告)号:US08452737B2

    公开(公告)日:2013-05-28

    申请号:US13347367

    申请日:2012-01-10

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    EXPLAINING CHANGES IN MEASURES THRU DATA MINING
    2.
    发明申请
    EXPLAINING CHANGES IN MEASURES THRU DATA MINING 有权
    解释数据挖掘中的措施变化

    公开(公告)号:US20090012919A1

    公开(公告)日:2009-01-08

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。

    Explaining changes in measures thru data mining
    3.
    发明授权
    Explaining changes in measures thru data mining 有权
    解释数据挖掘措施的变化

    公开(公告)号:US07899776B2

    公开(公告)日:2011-03-01

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F17/00 G06N5/04

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。

    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE
    4.
    发明申请
    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE 有权
    基于高效数据编码的大规模数据存储

    公开(公告)号:US20100030796A1

    公开(公告)日:2010-02-04

    申请号:US12270873

    申请日:2008-11-14

    IPC分类号: G06F17/00

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE
    5.
    发明申请
    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE 有权
    基于高效数据编码的大规模数据存储

    公开(公告)号:US20120109910A1

    公开(公告)日:2012-05-03

    申请号:US13347367

    申请日:2012-01-10

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    Efficient column based data encoding for large-scale data storage
    6.
    发明授权
    Efficient column based data encoding for large-scale data storage 有权
    高效的基于列的数据编码用于大规模数据存储

    公开(公告)号:US08108361B2

    公开(公告)日:2012-01-31

    申请号:US12270873

    申请日:2008-11-14

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    Multidimensional data cubes with high-cardinality attributes
    10.
    发明授权
    Multidimensional data cubes with high-cardinality attributes 有权
    具有高基数属性的多维数据立方体

    公开(公告)号:US08380748B2

    公开(公告)日:2013-02-19

    申请号:US12042674

    申请日:2008-03-05

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30592

    摘要: Computer-readable media, systems, and methods for building a multidimensional data cube having one or more high-cardinality attributes are described. In embodiments, data is extracted from one or more databases. It is determined that one or more instances of the data are fact data and one or more instances of the data are dimension data. Each member of the fact data is one instance of a dimension and each instance of the dimension data includes an attribute for grouping the fact data. Moreover, in embodiments it is determined that one or more instances of the dimension data are high-cardinality attributes. The one or more high-cardinality attributes are processed with fact data and stored in fact tables on a computer storage medium.

    摘要翻译: 描述了用于构建具有一个或多个高基数属性的多维数据立方体的计算机可读介质,系统和方法。 在实施例中,从一个或多个数据库提取数据。 确定数据的一个或多个实例是事实数据,并且数据的一个或多个实例是尺寸数据。 事实数据的每个成员是维度的一个实例,维数据的每个实例包括用于对事实数据进行分组的属性。 此外,在实施例中,确定尺寸数据的一个或多个实例是高基数属性。 一个或多个高基数属性用事实数据处理并存储在计算机存储介质上的事实表中。