Data field extraction by a data intake and query system

    公开(公告)号:US12205022B2

    公开(公告)日:2025-01-21

    申请号:US16945415

    申请日:2020-07-31

    Applicant: Splunk Inc.

    Abstract: Systems and methods are described for extracting data fields from logs ingested in a data processing pipeline or otherwise stored. For example, a log can be applied as an input to an artificial intelligence model trained to infer a log sourcetype of logs, and the artificial intelligence model can output an inferred log sourcetype of the log. The inferred log sourcetype can be used to select another artificial intelligence model trained to extract data fields from logs having the inferred log sourcetype, and the log can then be applied as an input to the other artificial intelligence model. The other artificial intelligence model may then output one or more data fields extracted from the log.

    Anomaly Detection System and Method for Implementing a Data Regularity Check and Adaptive Thresholding

    公开(公告)号:US20250028618A1

    公开(公告)日:2025-01-23

    申请号:US18222870

    申请日:2023-07-17

    Applicant: Splunk Inc.

    Abstract: Computerized methodologies are disclosed that are directed to detecting anomalies within a time-series data set. A first aspect of the anomaly detection process includes analyzing the regularity of the data points of the time-series data set and determining whether a data aggregation process is to be performed based on the regularity of the data points, which results in a time-series data set having data points occurring at regular intervals. A seasonality pattern may be determined for the time-series data set, where a silhouette score is computed to measure the quality of the fit of the seasonality pattern to the time-series data. The silhouette score may be compared to a threshold and based on the comparison, the seasonality pattern or a set of heuristics may be utilized in an anomaly detection process. When the seasonality pattern is utilized, the seasonality pattern may be utilized to generate thresholds indicating anomalous behavior.

    Log sourcetype inference model training for a data intake and query system

    公开(公告)号:US11704490B2

    公开(公告)日:2023-07-18

    申请号:US16945448

    申请日:2020-07-31

    Applicant: Splunk Inc.

    CPC classification number: G06F40/284 G06F16/3347 G06F40/242 G06N5/04 G06N20/00

    Abstract: Systems and methods are described for training an artificial intelligence model to infer a log sourcetype of a log. For example, logs may have different log sourcetypes, and logs having the same log sourcetypes may have different messagetypes. The artificial intelligence model may be a machine learning model, and can be trained using training data that includes logs with known log sourcetypes. Each log can be tokenized, filtered, converted into a vector, and applied to a machine learning model as an input to perform the training. The machine learning model may output an inferred log sourcetype, which can be compared with the known log sourcetype to update model parameters to improve the machine learning model accuracy. The trained machine learning model may be trained to infer a log sourcetype of a log regardless of the messagetype of the log.

    Conditional processing based on inferred sourcetypes

    公开(公告)号:US11106681B2

    公开(公告)日:2021-08-31

    申请号:US16175636

    申请日:2018-10-30

    Applicant: Splunk, Inc.

    Abstract: Messages of a first data stream may be accessed from an ingestion buffer in communication with a streaming data processor to receive data from the first data stream. At the streaming data processor and using an inference model, a sourcetype associated with one or more messages from the first data stream may be determined. The one or more messages may include a portion of machine data. Using the streaming data processor, a second data stream may be generated from the first data stream. The second data stream may include a subset of messages from the first data stream. A message of the subset of messages may be included in the second data stream based on a condition associated with the sourcetype for the message. At least one processing operation may be performed on at least one of the subset of messages from the second data stream.

    Exploratory data analysis system for generation of wildcards within log templates through log clustering and analysis thereof

    公开(公告)号:US12182174B1

    公开(公告)日:2024-12-31

    申请号:US18147639

    申请日:2022-12-28

    Applicant: SPLUNK Inc.

    Abstract: A search assistant engine is described that integrates with a data intake and query system and provides an intuitive user interface to assist a user in searching and evaluating indexed event data. Additionally, the search assistant engine provides logic to intelligently provide data to the user through the user interface such as determining fields of events likely to be of interest based on determining a mutual information score for each field and determining groups of related fields based on determining a mutual information score for each field grouping. Some implementations utilize machine learning techniques in certain analyses such as when clustering events and determining an event templates for each cluster. Additionally, the search assistant engine may import terms or characters from user interaction into predetermined search query templates to generate tailored search query for the user.

Patent Agency Ranking