Query-level access to external petabyte-scale distributed file systems

    公开(公告)号:US10108682B2

    公开(公告)日:2018-10-23

    申请号:US15451134

    申请日:2017-03-06

    Abstract: A method and system to creating query-level access to an external distributed file system by identifying a location of one or more external data residing on the external distributed file system, creating a query specifying an external table within a database engine having one or more location files, wherein the location files identify metadata operations for accessing and processing the one or more external data, defining metadata operations for accessing and processing the one or more external data, wherein the processing that produces one or more result files occurs at the external distributed file system, and executing the query at the database engine to create the external table, the external table comprising the one or more location files identifying the metadata directives for processing query-level requests on the one or more external data stored on the external distributed file system.

    System and method for efficient connection management in a massively parallel or distributed database environment

    公开(公告)号:US10180973B2

    公开(公告)日:2019-01-15

    申请号:US14864788

    申请日:2015-09-24

    Abstract: A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.

    SYSTEM AND METHOD FOR CREATING AN INTELLIGENT SYNOPSIS OF A DATABASE USING RE-PARTITIONING BASED SAMPLING
    3.
    发明申请
    SYSTEM AND METHOD FOR CREATING AN INTELLIGENT SYNOPSIS OF A DATABASE USING RE-PARTITIONING BASED SAMPLING 审中-公开
    使用基于再分类抽样创建数据库的智能概述的系统和方法

    公开(公告)号:US20170024452A1

    公开(公告)日:2017-01-26

    申请号:US14809004

    申请日:2015-07-24

    CPC classification number: G06F16/24556

    Abstract: The present invention provides a re-partitioning-based sampling system and method which provides for generating a synopsis from large database tables such that an aggregation query performed on the synopsis provides an approximate answer to the aggregation query which is in prescribed error bounds relative to a query on the full database. The system includes a partition function generator, a synopsis vector calculator, and a synopsis constructor. The synopsis constructed by the system is sufficiently small to be held in memory to allow quick and resource efficient satisficing of aggregation queries.

    Abstract translation: 本发明提供了一种基于重新分区的采样系统和方法,其提供从大型数据库表生成概要,使得对概要执行的聚合查询为聚合查询提供了相对于一个 查询完整数据库。 该系统包括分区函数发生器,概要向量计算器和概要构造函数。 由系统构建的概要足够小以保存在存储器中,以允许快速且资源有效地满足聚合查询。

    System and method for generating rowid range-based splits in a massively parallel or distributed database environment

    公开(公告)号:US10380114B2

    公开(公告)日:2019-08-13

    申请号:US14864773

    申请日:2015-09-24

    Abstract: A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.

    System and method for consistent reads between tasks in a massively parallel or distributed database environment

    公开(公告)号:US10528596B2

    公开(公告)日:2020-01-07

    申请号:US14864792

    申请日:2015-09-24

    Abstract: A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.

    System and method for data transfer from JDBC to a data warehouse layer in a massively parallel or distributed database environment

    公开(公告)号:US10089377B2

    公开(公告)日:2018-10-02

    申请号:US14864782

    申请日:2015-09-24

    Abstract: A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.

    System and method for generating partition-based splits in a massively parallel or distributed database environment

    公开(公告)号:US10089357B2

    公开(公告)日:2018-10-02

    申请号:US14864776

    申请日:2015-09-24

    Abstract: A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.

Patent Agency Ranking