EFFICIENT LARGE-SCALE FILTERING AND/OR SORTING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES
    1.
    发明申请
    EFFICIENT LARGE-SCALE FILTERING AND/OR SORTING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES 有权
    有效的大规模过滤和/或分类用于查询基于数据的数据编码结构

    公开(公告)号:US20100088315A1

    公开(公告)日:2010-04-08

    申请号:US12363637

    申请日:2009-01-30

    IPC分类号: G06F17/30

    摘要: The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically with respect to complex queries implicating filter and/or sort operations for data over a defined window. In this regard, in various embodiments, a method is provided that avoids scenarios involving expensive sorting of a high percentage of, or all, rows, either by not sorting any rows at all, or by sorting only a very small number of rows consistent with or smaller than a number of rows associated with the size of the requested window over the data. In one embodiment, this is achieved by splitting an external query request into two different internal sub-requests, a first one that computes statistics about distribution of rows for any specified WHERE clauses and ORDER BY columns, and a second one that selects only the rows that match the window based on the statistics.

    摘要翻译: 主题公开涉及查询基于列的数据编码结构,其能够在大规模数据存储上进行有效的查询处理,更具体地涉及涉及在定义的窗口上涉及数据的过滤器和/或排序操作的复杂查询。 在这方面,在各种实施例中,提供了一种方法,其避免了通过不对任何行进行排序的方式来避免高百分比或全部行的昂贵排序的情况,或者仅通过仅排列非常小数量的与 或小于与数据上所请求的窗口大小相关联的行数。 在一个实施例中,这是通过将外部查询请求分割成两个不同的内部子请求来实现的,第一个是根据任何指定的WHERE子句和ORDER BY列计算关于行的分布的统计信息,第二个仅选择行 根据统计信息匹配窗口。

    Efficient large-scale filtering and/or sorting for querying of column based data encoded structures
    2.
    发明授权
    Efficient large-scale filtering and/or sorting for querying of column based data encoded structures 有权
    用于查询基于列的数据编码结构的高效大规模过滤和/或排序

    公开(公告)号:US08478775B2

    公开(公告)日:2013-07-02

    申请号:US12363637

    申请日:2009-01-30

    IPC分类号: G06F17/00 G06F7/00

    摘要: The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically with respect to complex queries implicating filter and/or sort operations for data over a defined window. In this regard, in various embodiments, a method is provided that avoids scenarios involving expensive sorting of a high percentage of, or all, rows, either by not sorting any rows at all, or by sorting only a very small number of rows consistent with or smaller than a number of rows associated with the size of the requested window over the data. In one embodiment, this is achieved by splitting an external query request into two different internal sub-requests, a first one that computes statistics about distribution of rows for any specified WHERE clauses and ORDER BY columns, and a second one that selects only the rows that match the window based on the statistics.

    摘要翻译: 主题公开涉及查询基于列的数据编码结构,其能够在大规模数据存储上进行有效的查询处理,更具体地涉及涉及在定义的窗口上涉及数据的过滤器和/或排序操作的复杂查询。 在这方面,在各种实施例中,提供了一种方法,其避免了通过不对任何行进行排序的方式来避免高百分比或全部行的昂贵排序的情况,或者仅通过仅排列非常小数量的与 或小于与数据上所请求的窗口大小相关联的行数。 在一个实施例中,这是通过将外部查询请求分割成两个不同的内部子请求来实现的,第一个是根据任何指定的WHERE子句和ORDER BY列计算关于行的分布的统计信息,第二个仅选择行 根据统计信息匹配窗口。

    Multidimensional data cubes with high-cardinality attributes
    3.
    发明授权
    Multidimensional data cubes with high-cardinality attributes 有权
    具有高基数属性的多维数据立方体

    公开(公告)号:US08380748B2

    公开(公告)日:2013-02-19

    申请号:US12042674

    申请日:2008-03-05

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30592

    摘要: Computer-readable media, systems, and methods for building a multidimensional data cube having one or more high-cardinality attributes are described. In embodiments, data is extracted from one or more databases. It is determined that one or more instances of the data are fact data and one or more instances of the data are dimension data. Each member of the fact data is one instance of a dimension and each instance of the dimension data includes an attribute for grouping the fact data. Moreover, in embodiments it is determined that one or more instances of the dimension data are high-cardinality attributes. The one or more high-cardinality attributes are processed with fact data and stored in fact tables on a computer storage medium.

    摘要翻译: 描述了用于构建具有一个或多个高基数属性的多维数据立方体的计算机可读介质,系统和方法。 在实施例中,从一个或多个数据库提取数据。 确定数据的一个或多个实例是事实数据,并且数据的一个或多个实例是尺寸数据。 事实数据的每个成员是维度的一个实例,维数据的每个实例包括用于对事实数据进行分组的属性。 此外,在实施例中,确定尺寸数据的一个或多个实例是高基数属性。 一个或多个高基数属性用事实数据处理并存储在计算机存储介质上的事实表中。

    Multidimensional database subcubes
    6.
    发明授权
    Multidimensional database subcubes 失效
    多维数据库子单元

    公开(公告)号:US07490106B2

    公开(公告)日:2009-02-10

    申请号:US11137233

    申请日:2005-05-25

    IPC分类号: G06F7/00

    摘要: The subject invention pertains to interaction with multidimensional data. More specifically, interactions can be constrained to a limited subset of a multidimensional data cube, namely a subcube. Subsequent to or concurrently with subcube creation, query execution or other interactions such as calculations can be consolidated or restricted to the smaller subcube query space rather than the typically enormous main cube. Multiple subcubes can also be created and nested to gradually reduce the query space. Deletion of one subcube can cause a reversion back to a previously defined or hierarchical parent subcube.

    摘要翻译: 本发明涉及与多维数据的交互。 更具体地,可以将交互约束到多维数据立方体的有限子集,即子多维数据集。 在子多维数据集创建之后或之后,查询执行或其他交互(如计算)可以合并或限制于较小的子多维数据集查询空间,而不是典型的巨大的主多维数据集。 也可以创建和嵌套多个子多维数据集,以逐渐减少查询空间。 删除一个子多维数据集可能导致返回到先前定义的或分层的父子单元。