ADDRESSING MEMORY LIMITS FOR PARTITION TRACKING AMONG WORKER NODES

    公开(公告)号:US20200065303A1

    公开(公告)日:2020-02-27

    申请号:US16657867

    申请日:2019-10-18

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for distributed processing a query in a first query language utilizing a query execution engine intended for single-device execution. While distributed processing provides numerous benefits over single-device processing, distributed query execution engines can be significantly more difficult to develop that single-device engines. Embodiments of this disclosure enable the use of a single-device engine to support distributed processing, by dividing a query into multiple stages, each of which can be executed by multiple, concurrent executions of a single-device engine. Between stages, data can be shuffled between executions of the engine, such that individual executions of the engine are provided with a complete set of records needed to implement an individual stage. Because single-device engines can be significantly less difficult to develop, use of the techniques described herein can enable a distributed system to rapidly support multiple query languages.

    REASSIGNING PROCESSING TASKS TO AN EXTERNAL STORAGE SYSTEM

    公开(公告)号:US20200050607A1

    公开(公告)日:2020-02-13

    申请号:US16657894

    申请日:2019-10-18

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for reducing execution time of a query that references external data systems. The system can determine an external data system is capable of processing one or more map or reduce phases of a map-reduce operation. When it is determined that the external data system can process a map or reduce phase, associated operations may be reassigned from the system to the external data system reducing the processing resources used by the system to response to the query and, in some cases, speeding up performance of the query.

    Multi-phased data execution in a data processing system

    公开(公告)号:US10545964B2

    公开(公告)日:2020-01-28

    申请号:US15419883

    申请日:2017-01-30

    Applicant: Splunk Inc.

    Abstract: The disclosed embodiments include a method performed by a data intake and query system. The method includes receiving a search query by a search head, defining a search process for applying the search query to indexers, delegating a first portion of the search process to indexers and a second portion of the search process to intermediary node(s) communicatively coupled to the search head and the indexers. The first portion can define a search scope for obtaining partial search results of the indexers and the second portion can define operations for combining the partial search results by the intermediary node(s) to produce a combination of the partial search results. The search head then receives the combination of the partial search results, and outputs final search results for the search query, where the final search results are based on the combination of the partial search results.

    DISTRIBUTING PARTIAL RESULTS FROM AN EXTERNAL DATA SYSTEM BETWEEN WORKER NODES

    公开(公告)号:US20190147084A1

    公开(公告)日:2019-05-16

    申请号:US16051304

    申请日:2018-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for executing a query that includes an indication to process data managed by an external data system. The system identifies the external data system that manages the data to be processed and generates a subquery for the external data system indicating that the results of the subquery are to be sent to one worker node of multiple worker nodes. The system instructs the one worker node to distribute the results received from the external data system to multiple worker nodes for processing.

    MULTI-PHASED DATA EXECUTION IN A DATA PROCESSING SYSTEM

    公开(公告)号:US20180218045A1

    公开(公告)日:2018-08-02

    申请号:US15419883

    申请日:2017-01-30

    Applicant: Splunk Inc.

    Abstract: The disclosed embodiments include a method performed by a data intake and query system. The method includes receiving a search query by a search head, defining a search process for applying the search query to indexers, delegating a first portion of the search process to indexers and a second portion of the search process to intermediary node(s) communicatively coupled to the search head and the indexers. The first portion can define a search scope for obtaining partial search results of the indexers and the second portion can define operations for combining the partial search results by the intermediary node(s) to produce a combination of the partial search results. The search head then receives the combination of the partial search results, and outputs final search results for the search query, where the final search results are based on the combination of the partial search results.

    Producing search results by aggregating messages from multiple search peers

    公开(公告)号:US09942318B2

    公开(公告)日:2018-04-10

    申请号:US15334690

    申请日:2016-10-26

    Applicant: Splunk Inc.

    Abstract: Asynchronous processing of messages that are received from multiple servers is disclosed. An example method may include transmitting, by a computer system, a search request to a plurality of search peers of a data aggregation and analysis system. The method may further include receiving a plurality of sub-application layer protocol packets from the plurality of search peers. The method may further include parsing, by a first processing thread of the computer system, one or more sub-application layer protocol packets of the plurality of sub-application layer protocol packets, to produce an application layer message representing a partial response to the search request. The method may further include processing, by a second processing thread of the computer system, the application layer message to produce a memory data structure representing an aggregated response to the search request.

    RESOURCE ALLOCATION FOR MULTIPLE DATASETS
    30.
    发明申请

    公开(公告)号:US20180089258A1

    公开(公告)日:2018-03-29

    申请号:US15665187

    申请日:2017-07-31

    Applicant: Splunk Inc.

    CPC classification number: G06F16/2425 G06F16/2272 G06F16/24535

    Abstract: Systems and methods are disclosed for processing queries against multiple dataset sources. One dataset source can include indexers that index and store data. The system can receive a query that identifies a set of data to be processed and a manner of processing the set of data. The set of data can include a first dataset that is accessible by one or more indexers and a second dataset that is accessible by one or more other dataset sources. A query coordinator can define a query processing scheme for obtaining and processing the set of data that includes a dynamic allocation of multiple layers of partitions. The partitions can operate on multiple worker nodes. The query can then be executed based on the query processing scheme.

Patent Agency Ranking