Storage-Side Scanning on Non-Natively Formatted Data
    3.
    发明申请
    Storage-Side Scanning on Non-Natively Formatted Data 审中-公开
    非本地格式化数据的存储侧扫描

    公开(公告)号:US20150356158A1

    公开(公告)日:2015-12-10

    申请号:US14733691

    申请日:2015-06-08

    Abstract: A storage system communicatively coupled to a DBMS performs storage-side scanning of data sources that are not stored in the native database storage format of the DBMS. Data sources for external tables are accessible in a storage system referred to herein as a distributed data access system, e.g. a Hadoop Distributed File System. To execute a query that references an external table, a DBMS first generates an execution plan. The distributed data access system supplies the DBMS with information that specifies each portion of the data source, and specifies which data node to use to access the portion. The DBMS sends a request for each portion to the respective data node, the request requesting that the data node generate rows from data in the portion. The request may specify scanning criteria, specifying one or more columns to project and/or filter on. The request may also specify code modules for the data node to execute to generate rows or records and columns.

    Abstract translation: 通信地耦合到DBMS的存储系统对不存储在DBMS的本地数据库存储格式的数据源执行存储侧扫描。 用于外部表的数据源可在本文称为分布式数据访问系统的存储系统中访问,例如, 一个Hadoop分布式文件系统。 要执行引用外部表的查询,DBMS首先生成执行计划。 分布式数据访问系统向DBMS提供指定数据源的每个部分的信息,并指定要用于访问该部分的数据节点。 DBMS向每个数据节点发送每个部分的请求,该请求请求数据节点从该部分中的数据生成行。 请求可以指定扫描条件,指定一个或多个列进行投影和/或过滤。 该请求还可以指定用于数据节点执行的代码模块以生成行或记录和列。

    Reducing data I/O using in-memory data structures

    公开(公告)号:US10198363B2

    公开(公告)日:2019-02-05

    申请号:US15268524

    申请日:2016-09-16

    Abstract: Techniques are described herein for generating and using in-memory data structures to represent columns in data block sets. In an embodiment, a database management system (DBMS) receives a query for a target data set managed by the DBMS. The query may specify a predicate for a column of the target data set. The predicate may include a filtering value to be compared with row values of the column of the target data set. Prior to accessing data block sets storing the target data set from persistent storage, the DBMS identifies an in-memory summary that corresponds to a data block set, in an embodiment. The in-memory summary may include in-memory data structures, each representing a column stored in the data block set. The DBMS determines that a particular in-memory data structure exists in the in-memory summary that represents a portion of values of the column indicated in the predicate of the query. Based on the particular in-memory data structure, the DBMS determines whether or not the data block set can possibly contain the filtering value in the column of the target data set. Based on this determination, the DBMS skips or retrieves the data block set from the persistent storage as part of the query evaluation.

    Reducing data I/O using in-memory data structures

    公开(公告)号:US10042781B2

    公开(公告)日:2018-08-07

    申请号:US15268524

    申请日:2016-09-16

    Abstract: Techniques are described herein for generating and using in-memory data structures to represent columns in data block sets. In an embodiment, a database management system (DBMS) receives a query for a target data set managed by the DBMS. The query may specify a predicate for a column of the target data set. The predicate may include a filtering value to be compared with row values of the column of the target data set. Prior to accessing data block sets storing the target data set from persistent storage, the DBMS identifies an in-memory summary that corresponds to a data block set, in an embodiment. The in-memory summary may include in-memory data structures, each representing a column stored in the data block set. The DBMS determines that a particular in-memory data structure exists in the in-memory summary that represents a portion of values of the column indicated in the predicate of the query. Based on the particular in-memory data structure, the DBMS determines whether or not the data block set can possibly contain the filtering value in the column of the target data set. Based on this determination, the DBMS skips or retrieves the data block set from the persistent storage as part of the query evaluation.

Patent Agency Ranking