GENERATING A SUBQUERY FOR AN EXTERNAL DATA SYSTEM USING A CONFIGURATION FILE

    公开(公告)号:US20190147086A1

    公开(公告)日:2019-05-16

    申请号:US16147165

    申请日:2018-09-28

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for receiving, at a data intake and query system, a query that includes an indication to process data managed by a third-party data storage and processing system that supports a different query language than the data intake and query system. The data intake and query system identifies a third-party data storage and processing system that manages the data to be processed and generates a subquery for execution by the third-party data storage and processing system, generates instructions for one or more worker nodes to receive and process results of the subquery from the third-party data storage and processing system, and instructs the worker nodes to provide results of the processing to the data intake and query system.

    EXECUTION OF A QUERY RECEIVED FROM A DATA INTAKE AND QUERY SYSTEM

    公开(公告)号:US20190138642A1

    公开(公告)日:2019-05-09

    申请号:US16051310

    申请日:2018-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for receiving and executing a query received from a data intake and query system and providing results to a first group of worker nodes in a distributed execution environment. The query identifies a set of data to be processed and a manner of processing the set of data. Based on the query, the system defines a query processing scheme, and generates instructions for a second group of worker nodes to obtain the set of data from one or more dataset sources and to process the set of data. The system communicates results of the query to the first group of worker nodes.

    MULTI-PARTITION OPERATION IN COMBINATION OPERATIONS

    公开(公告)号:US20190095493A1

    公开(公告)日:2019-03-28

    申请号:US15713976

    申请日:2017-09-25

    Applicant: Splunk Inc.

    Abstract: In an environment where multiple datasets are to be combined, systems and methods are disclosed for allocating a group of data entries from at least one dataset into multiple partitions. For a particular partition, the subgroup in the partition can be combined with data entries from the other dataset. In some cases, groups of data entries from each dataset are assigned to different partitions. For a particular partition, a subgroup is duplicated, some of the data entries of the subgroup are reassigned to other partitions, the subgroup is reformed to include data entries from other partitions, and the reformed subgroup is combined with the subgroup from the other dataset(s).

    GENERATING A DISTRIBUTED EXECUTION MODEL WITH UNTRUSTED COMMANDS

    公开(公告)号:US20190095491A1

    公开(公告)日:2019-03-28

    申请号:US15714424

    申请日:2017-09-25

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for generating a distributed execution model with untrusted commands. The system can receive a query, and process the query to identify the untrusted commands. The system can use data associated with the untrusted command to identify one or more files associated with the untrusted command. Based on the files, the system can generate a data structure and include one or more identifiers associated with the data structure in the distributed execution model. The system can distribute the distributed execution model to one or more nodes in a distributed computing environment for execution.

    DYNAMIC RESOURCE ALLOCATION FOR REAL-TIME SEARCH

    公开(公告)号:US20180089324A1

    公开(公告)日:2018-03-29

    申请号:US15665339

    申请日:2017-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for utilizing an ingested data buffer operating according to a publish-subscribe messaging model as an intake mechanism for a query system. Data from various sources can be placed into the data buffer according to different topics. Indexers can subscribe to these topics in order to ingest the data into the system for long-term storage and later search. In addition, worker nodes may directly subscribe to the topics to enable continuous or streaming searching of the data, without delays that may be caused by ingestion of the data at an indexer. When a request for a streaming search is received, a query coordinator can determine a number of message queues on the data buffer that contain potentially relevant messages. The query coordinator can then dynamically allocate partitions operating on worker nodes to retrieve and intake messages from the message queues into a phased search process.

    QUERY PROCESSING USING QUERY-RESOURCE USAGE AND NODE UTILIZATION DATA

    公开(公告)号:US20180089269A1

    公开(公告)日:2018-03-29

    申请号:US15665148

    申请日:2017-07-31

    Applicant: Splunk Inc.

    CPC classification number: G06F16/24542 G06F16/24554 G06F16/258

    Abstract: Systems and methods are disclosed for processing queries against one or more dataset sources. The system tracks query resource data and resource utilization data. The query-resource usage data can indicate resources used to execute queries. The node resource utilization data can indicate current utilization of nodes in the system. Upon receipt of a query that identifies a set of data to be processed and a manner of processing the set of data, the system can use the query-resource usage data and the resource utilization data to define a query processing scheme. The query can then be executed using the query processing scheme. In some cases, the query coordinator can dynamically allocate partitions operating on worker nodes to execute the query.

    DYNAMIC RESOURCE ALLOCATION FOR COMMON STORAGE QUERY

    公开(公告)号:US20180089262A1

    公开(公告)日:2018-03-29

    申请号:US15665302

    申请日:2017-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are disclosed for processing queries against a common storage utilizing dynamically allocated partitions operating on one or more worker nodes. The common storage can include one or more data stores, which collectively contain a data set divided across multiple buckets of data. To query the common storage, a query coordinator can retrieve metadata regarding the multiple buckets, in order to determine a subset of buckets that are potentially relevant to a query. The query coordinator can then dynamically allocate partitions operating on worker nodes to retrieve and intake individual buckets of the subset into a phased search process. The dynamic allocation can be selected to maximize parallelization of the buckets across partitions, thus increasing a speed at which the common storage can be searched.

    EXTERNAL DATASET CAPABILITY COMPENSATION
    49.
    发明申请

    公开(公告)号:US20180089259A1

    公开(公告)日:2018-03-29

    申请号:US15665248

    申请日:2017-07-31

    Applicant: Splunk Inc.

    CPC classification number: G06F16/2425 G06F16/2282

    Abstract: Systems and methods are disclosed for processing queries against an external data source utilizing dynamically allocated partitions operating on one or more worker nodes. The external data source can include data that has not been processed by the system. To query the external data source, a query coordinator can generate a subquery for the external data source based on determined functionality of the data source. The subquery can identify data in the external data source for processing and a manner for processing the data. In addition, the query coordinator can dynamically allocate partitions operating on worker nodes to retrieve and intake results of the subquery. In some cases, number of partitions allocated can be based on a number of partitions supported by the external data source.

    Reassigning processing tasks to an external storage system

    公开(公告)号:US12248484B2

    公开(公告)日:2025-03-11

    申请号:US16657894

    申请日:2019-10-18

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for reducing execution time of a query that references external data systems. The system can determine an external data system is capable of processing one or more map or reduce phases of a map-reduce operation. When it is determined that the external data system can process a map or reduce phase, associated operations may be reassigned from the system to the external data system reducing the processing resources used by the system to response to the query and, in some cases, speeding up performance of the query.

Patent Agency Ranking