Distributed columnar data set subset retrieval

    公开(公告)号:US11263175B2

    公开(公告)日:2022-03-01

    申请号:US17039584

    申请日:2020-09-30

    摘要: An apparatus includes a processor to: within each reading thread, retrieve a data set part and corresponding part metadata from storage device(s), analyze row group metadata for each row group within the data set part to identify candidate row group(s) meeting specified criteria, and store the candidate row group(s) and corresponding row group metadata within a data buffer of a queue; operate the queue as a FIFO buffer; within each provision thread, retrieve one of multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to identify rows meeting the criteria, and provide those rows to the requesting device or an application; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.

    DISTRIBUTED COLUMNAR DATA SET RETRIEVAL

    公开(公告)号:US20210026805A1

    公开(公告)日:2021-01-28

    申请号:US17039314

    申请日:2020-09-30

    摘要: An apparatus includes a processor to: instantiate data buffers of a queue, reading threads, and provision threads; within each reading thread, use an identifier provided in a data buffer of the queue to retrieve the corresponding data set part and part metadata from storage device(s), and store both within the data buffer; operate the queue as a (FIFO) buffer; within each provision thread, retrieve a row group from among multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to decompress at least one column, and provide the data values of the row group to the requesting device or an application routine; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.

    Distributed data set indexing
    4.
    发明授权

    公开(公告)号:US10303670B2

    公开(公告)日:2019-05-28

    申请号:US15984706

    申请日:2018-05-21

    摘要: An apparatus including a processor to index data records within a data cell, wherein for each data record, the processor retrieves data values from first and second data fields; determines whether the first and second data fields store unique data values; in response to the first data field storing a unique data value, adds an identifier of the data record to a first unique values index, in response to the second data field storing a unique data value, adds the identifier to a second unique values index, wherein identifiers of data records within the unique values indexes are ordered based on corresponding unique data values; and generates an indication of ranges of data values of the first and second data fields to enable a determination of whether a data value specified in search criteria is present within at least the data cell.

    Distributed columnar data set retrieval

    公开(公告)号:US11347686B2

    公开(公告)日:2022-05-31

    申请号:US17039314

    申请日:2020-09-30

    摘要: An apparatus includes a processor to: instantiate data buffers of a queue, reading threads, and provision threads; within each reading thread, use an identifier provided in a data buffer of the queue to retrieve the corresponding data set part and part metadata from storage device(s), and store both within the data buffer; operate the queue as a (FIFO) buffer; within each provision thread, retrieve a row group from among multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to decompress at least one column, and provide the data values of the row group to the requesting device or an application routine; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.

    DISTRIBUTED COLUMNAR DATA SET SUBSET RETRIEVAL

    公开(公告)号:US20210026806A1

    公开(公告)日:2021-01-28

    申请号:US17039584

    申请日:2020-09-30

    摘要: An apparatus includes a processor to: within each reading thread, retrieve a data set part and corresponding part metadata from storage device(s), analyze row group metadata for each row group within the data set part to identify candidate row group(s) meeting specified criteria, and store the candidate row group(s) and corresponding row group metadata within a data buffer of a queue; operate the queue as a FIFO buffer; within each provision thread, retrieve one of multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to identify rows meeting the criteria, and provide those rows to the requesting device or an application; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.