DATA ACCESS AND RECOMMENDATION SYSTEM

    公开(公告)号:US20210286777A1

    公开(公告)日:2021-09-16

    申请号:US16816511

    申请日:2020-03-12

    申请人: SAP SE

    IPC分类号: G06F16/21 G06F16/22 G06N5/04

    摘要: System, method, and various embodiments for providing a data access and recommendation system are described herein. An embodiment operates by identifying a column access of one or more data values of a first column of a plurality of columns of a table of a database during a sampling period. A count of how many of the one or more data values are accessed during the column access are recorded. A first counter, corresponding to the first column and stored in a distributed hash table, is incremented by the count. The sampling period is determined to have expired. A load recommendation on how to load data values into the first column based on the first counter is computed. The load recommendation for implementation into the database for one or more subsequent column accesses is provided.

    Paged Inverted Index
    13.
    发明申请

    公开(公告)号:US20170154061A1

    公开(公告)日:2017-06-01

    申请号:US14954736

    申请日:2015-11-30

    申请人: SAP SE

    IPC分类号: G06F17/30 H03M7/30

    摘要: Disclosed herein are system and method embodiments for generating a paged inverted index. An embodiment is generated by storing a first data structure and the second data structure in a plurality of pages, where the plurality of pages are stored in the one or more memories. The first data structure is stored in the plurality of pages and includes a plurality of value identifiers, where a value identifier corresponds to an offset. The second data structure stored in the plurality of pages includes a plurality of row positions, wherein a row position is at a location that corresponds to the offset in the first data structure and identifies a position of row in a table that stores data associated with the value ID.

    DESIGN AND IMPLEMENTATION OF DATA ACCESS METRICS FOR AUTOMATED PHYSICAL DATABASE DESIGN

    公开(公告)号:US20220269658A1

    公开(公告)日:2022-08-25

    申请号:US17324874

    申请日:2021-05-19

    申请人: SAP SE

    IPC分类号: G06F16/215 H03M7/30

    摘要: The present disclosure involves systems, software, and computer implemented methods for improved design and implementation of data access metrics for automated physical database design. An example method includes identifying a database workload for which index advisor access counters are to be tracked. Each SQL statement in the database workload is executed. For each SQL statement, attribute sets are determined for which a selection predicate filters a result for an SQL statement. An output cardinality of each selection predicate is determined. A logarithmic counter for an attribute set corresponding to the selection predicate is determined based on the output cardinality of the selection predicate. The determined logarithmic counter is incremented. Respective values for logarithmic counters of the determined attributes are provided to an index advisor. The index advisor determines attribute sets for which to propose an index based on the logarithmic counters of the respective attribute sets.

    FASTER ACCESS FOR COMPRESSED TIME SERIES DATA: THE BLOCK INDEX

    公开(公告)号:US20220138173A1

    公开(公告)日:2022-05-05

    申请号:US17579336

    申请日:2022-01-19

    申请人: SAP SE

    摘要: A system and method for faster access for compressed time series data. A set of blocks are generated based on a table stored in a database of the data platform. The table stores data associated with multiple sources of data provided as consecutive values, each block containing index vectors having a range of the consecutive values. A block index is generated for each block having a field start vector representing a starting position of the block relative to the range of consecutive values, and a starting value vector representing a value of the block at the starting position. The field start vector of the block index is accessed to obtain the starting position of a field corresponding to a first block and to the range of the consecutive values of the first block. The starting value vector is then determined from the block index to determine an end and a length of the field of the first block.

    Linear run length encoding: compressing the index vector

    公开(公告)号:US11238023B2

    公开(公告)日:2022-02-01

    申请号:US16715677

    申请日:2019-12-16

    申请人: SAP SE

    IPC分类号: G06F16/22 G06F16/2458

    摘要: A system and method include storing a table of time series data in a database of a data platform, the table of time series data representing a set of time series blocks. Each time series block of the set of time series blocks has a time series of equally-incremented time intervals and a run length. Each time interval of the time series is associated with one or more values. The run length has a starting position with at least one starting value and an ending position with at least one ending value. The starting position and the at least one starting value is stored for each time series block in a column store of the database. Then, a compressed index is generated in the column store of the database for each time series block, the compressed index comprising the starting position and the at least one starting value.

    COMPRESSING TIME STAMP COLUMNS
    17.
    发明申请

    公开(公告)号:US20200057763A1

    公开(公告)日:2020-02-20

    申请号:US16661993

    申请日:2019-10-23

    申请人: SAP SE

    IPC分类号: G06F16/2458 G06F16/22

    摘要: Disclosed is a system and method for improving database memory consumption and performance using compression of time stamp columns. A number of time stamps of a time series is received. The time stamps have a start time, and are separated by an equal increment of time that defines an interval. The start time and interval are stored in a dictionary of a column store of a database. An index is generated in the column store of the database, the index having a number of index vectors. Using the index vectors, each time stamp of the number of time stamps can be calculated from the start time stored in the dictionary and the position in the time series based on the interval stored in the dictionary.

    VALUE-ID-BASED SORTING IN COLUMN-STORE DATABASES

    公开(公告)号:US20180150494A1

    公开(公告)日:2018-05-31

    申请号:US15363274

    申请日:2016-11-29

    申请人: SAP SE

    IPC分类号: G06F17/30

    摘要: Innovations in performing sort operations for dictionary-compressed values of columns in a column-store database using value identifiers (“IDs”) are described. For example, a database system includes a data store and an execution engine. The data store stores values at positions of a column A dictionary maps distinct values to corresponding value IDs. An inverted index stores, for each of the corresponding value IDs, a list of those of the positions that contain the associated distinct value. The execution engine processes a request to sort values at an input set of the positions and identify an output set of the positions for sorted values. In particular, the execution engine iterates through positions stored in the lists of the inverted index. For a given position, the execution engine checks if the given position is one of the input set and, if so, adds the given position to the output set.

    Faster access for compressed time series data: the block index

    公开(公告)号:US11892999B2

    公开(公告)日:2024-02-06

    申请号:US17579336

    申请日:2022-01-19

    申请人: SAP SE

    摘要: A system and method for faster access for compressed time series data. A set of blocks are generated based on a table stored in a database of the data platform. The table stores data associated with multiple sources of data provided as consecutive values, each block containing index vectors having a range of the consecutive values. A block index is generated for each block having a field start vector representing a starting position of the block relative to the range of consecutive values, and a starting value vector representing a value of the block at the starting position. The field start vector of the block index is accessed to obtain the starting position of a field corresponding to a first block and to the range of the consecutive values of the first block. The starting value vector is then determined from the block index to determine an end and a length of the field of the first block.