Efficient column based data encoding for large-scale data storage
    21.
    发明授权
    Efficient column based data encoding for large-scale data storage 有权
    高效的基于列的数据编码用于大规模数据存储

    公开(公告)号:US08452737B2

    公开(公告)日:2013-05-28

    申请号:US13347367

    申请日:2012-01-10

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    EXPLAINING CHANGES IN MEASURES THRU DATA MINING
    22.
    发明申请
    EXPLAINING CHANGES IN MEASURES THRU DATA MINING 有权
    解释数据挖掘中的措施变化

    公开(公告)号:US20090012919A1

    公开(公告)日:2009-01-08

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。

    Explaining changes in measures thru data mining
    23.
    发明授权
    Explaining changes in measures thru data mining 有权
    解释数据挖掘措施的变化

    公开(公告)号:US07899776B2

    公开(公告)日:2011-03-01

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F17/00 G06N5/04

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。