Using a rowset as a query parameter
    4.
    发明授权
    Using a rowset as a query parameter 有权
    使用行集作为查询参数

    公开(公告)号:US07451137B2

    公开(公告)日:2008-11-11

    申请号:US11069121

    申请日:2005-02-28

    IPC分类号: G06F17/30

    摘要: Architecture that facilitates syntax processing for data mining statements. The system includes a syntax engine that receives as an input a query statement which, for example, is a data mining request. The statement can be generated from many different sources, e.g., a client application and/or a server application, and requests query processing of a data source (e.g., a relational database) to return a result set. The syntax engine includes a binding component that converts the query statement into an encapsulated statement in accordance with a predefined grammar. The encapsulated statement includes both data and data operations to be performed on the data of the data source, and which is understood by the data source. An execution component processes the encapsulated statement against the data source to return the desired result set.

    摘要翻译: 促进数据挖掘语句的语法处理的架构。 该系统包括语法引擎,其作为输入接收诸如数据挖掘请求的查询语句。 语句可以从许多不同的来源(例如客户端应用程序和/或服务器应用程序)生成,并且请求数据源(例如,关系数据库)的查询处理以返回结果集。 语法引擎包括一个绑定组件,它根据预定义的语法将查询语句转换成封装语句。 封装语句包括要对数据源的数据执行的数据和数据操作,数据源可以理解。 执行组件根据数据源处理封装语句以返回所需的结果集。

    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE
    5.
    发明申请
    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE 有权
    基于高效数据编码的大规模数据存储

    公开(公告)号:US20100030796A1

    公开(公告)日:2010-02-04

    申请号:US12270873

    申请日:2008-11-14

    IPC分类号: G06F17/00

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    Efficient column based data encoding for large-scale data storage
    6.
    发明授权
    Efficient column based data encoding for large-scale data storage 有权
    高效的基于列的数据编码用于大规模数据存储

    公开(公告)号:US08452737B2

    公开(公告)日:2013-05-28

    申请号:US13347367

    申请日:2012-01-10

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    EXPLAINING CHANGES IN MEASURES THRU DATA MINING
    7.
    发明申请
    EXPLAINING CHANGES IN MEASURES THRU DATA MINING 有权
    解释数据挖掘中的措施变化

    公开(公告)号:US20090012919A1

    公开(公告)日:2009-01-08

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F15/18

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。

    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE
    8.
    发明申请
    EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE 有权
    基于高效数据编码的大规模数据存储

    公开(公告)号:US20120109910A1

    公开(公告)日:2012-05-03

    申请号:US13347367

    申请日:2012-01-10

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    Efficient column based data encoding for large-scale data storage
    9.
    发明授权
    Efficient column based data encoding for large-scale data storage 有权
    高效的基于列的数据编码用于大规模数据存储

    公开(公告)号:US08108361B2

    公开(公告)日:2012-01-31

    申请号:US12270873

    申请日:2008-11-14

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.

    摘要翻译: 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。

    Explaining changes in measures thru data mining
    10.
    发明授权
    Explaining changes in measures thru data mining 有权
    解释数据挖掘措施的变化

    公开(公告)号:US07899776B2

    公开(公告)日:2011-03-01

    申请号:US11772480

    申请日:2007-07-02

    IPC分类号: G06F17/00 G06N5/04

    CPC分类号: G06F17/30592

    摘要: Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.

    摘要翻译: 用于识别在关系存储和/或OLAP环境中导致事务重大变化的因素的系统和方法。 事务被分组在整个数据空间中定义的重要类别中,以检测有趣的子空间事务。 随后,可以选择显示两个切片之间强差异的子空间,然后在子报告中对子空间进行分组,以测量每个子报告的覆盖范围。 然后可以生成包含先前行为中检测到的子报告列表的最终报告。