Multi-partition operation in combination operations

    公开(公告)号:US11151137B2

    公开(公告)日:2021-10-19

    申请号:US15713976

    申请日:2017-09-25

    Applicant: Splunk Inc.

    Abstract: In an environment where multiple datasets are to be combined, systems and methods are disclosed for allocating a group of data entries from at least one dataset into multiple partitions. For a particular partition, the subgroup in the partition can be combined with data entries from the other dataset. In some cases, groups of data entries from each dataset are assigned to different partitions. For a particular partition, a subgroup is duplicated, some of the data entries of the subgroup are reassigned to other partitions, the subgroup is reformed to include data entries from other partitions, and the reformed subgroup is combined with the subgroup from the other dataset(s).

    Dynamic resource allocation for common storage query

    公开(公告)号:US10795884B2

    公开(公告)日:2020-10-06

    申请号:US15665302

    申请日:2017-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for processing queries against a common storage utilizing dynamically allocated partitions operating on one or more worker nodes. The common storage can include one or more data stores, which collectively contain a data set divided across multiple buckets of data. To query the common storage, a query coordinator can retrieve metadata regarding the multiple buckets, in order to determine a subset of buckets that are potentially relevant to a query. The query coordinator can then dynamically allocate partitions operating on worker nodes to retrieve and intake individual buckets of the subset into a phased search process. The dynamic allocation can be selected to maximize parallelization of the buckets across partitions, thus increasing a speed at which the common storage can be searched.

    Query processing using query-resource usage and node utilization data

    公开(公告)号:US10726009B2

    公开(公告)日:2020-07-28

    申请号:US15665148

    申请日:2017-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for processing queries against one or more dataset sources. The system tracks query resource data and resource utilization data. The query-resource usage data can indicate resources used to execute queries. The node resource utilization data can indicate current utilization of nodes in the system. Upon receipt of a query that identifies a set of data to be processed and a manner of processing the set of data, the system can use the query-resource usage data and the resource utilization data to define a query processing scheme. The query can then be executed using the query processing scheme. In some cases, the query coordinator can dynamically allocate partitions operating on worker nodes to execute the query.

    Determining a Record Generation Estimate of a Processing Task

    公开(公告)号:US20190258632A1

    公开(公告)日:2019-08-22

    申请号:US16397930

    申请日:2019-04-29

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for determining a record generation estimate related to a particular processing task. The system obtains a sample set of data that includes multiple records. The system applies a processing task, such as a transform or regular expression rule to the sample set of data and determines how many records are generated by the processing task. Based on the number of records generated, the system determines a record generation estimate. The system can use the record generation estimate to allocate compute resources or determine a query execution time for at least a portion of the query based on the record generation estimate.

Patent Agency Ranking