Abstract:
Systems and methods are provided for identifying network addresses and/or IDs of a deduplicated list among network data, machine data, and/or events derived from network data and/or machine data, and for identifying notable events by searching for the presence of network addresses and/or network IDs that are deduplicated across lists received from multiple external sources. One method includes receiving a plurality of lists of network locations, wherein each list is received from over a network, wherein each of the network locations includes a domain name or an IP address, and wherein at least two of the plurality of lists each include a same network location; aggregating the plurality of lists of network locations into a deduplicated list of unique network locations; and searching network data or machine data for a network location included in the deduplicated list of unique network locations.
Abstract:
The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.
Abstract:
Methods and apparatus consistent with the invention provide the ability to organize, index, search, and present time series data based on searches. Time series data are sequences of time stamped records occurring in one or more usually continuous streams, representing some type of activity. In one embodiment, time series data is organized into discrete events with normalized time stamps and the events are indexed by time and keyword. A search is received and relevant event information is retrieved based in whole or in part on the time indexing mechanism, keyword indexing mechanism, or statistical indices calculated at the time of the search.
Abstract:
Methods and apparatus consistent with the invention provide the ability to organize and build understandings of machine data generated by a variety of information-processing environments. Machine data is a product of information-processing systems (e.g., activity logs, configuration files, messages, database records) and represents the evidence of particular events that have taken place and been recorded in raw data format. In one embodiment, machine data is turned into a machine data web by organizing machine data into events and then linking events together.
Abstract:
Embodiments are directed towards real time display of event records and extracted values based on at least one extraction rule, such as a regular expression. A user interface may be employed to enable a user to have an extraction rule automatically generate and/or to manually enter an extraction rule. The user may be enabled to manually edit a previously provided extraction rule, which may result in real time display of updated extracted values. The extraction rule may be utilized to extract values from each of a plurality of records, including event records of unstructured machine data. Statistics may be determined for each unique extracted value, and may be displayed to the user in real time. The user interface may also enable the user to select at least one unique extracted value to display those event records that include an extracted value that matches the selected value.
Abstract:
Embodiments include generating data models that may give semantic meaning for unstructured or structured data that may include data generated and/or received by search engines, including a time series engine. A method includes generating a data model for data stored in a repository. Generating the data model includes generating an initial query string, executing the initial query string on the data, generating an initial result set based on the initial query string being executed on the data, determining one or more candidate fields from one or results of the initial result set, generating a candidate data model based on the one or more candidate fields, iteratively modifying the candidate data model until the candidate data model models the data, and using the candidate data model as the data model.
Abstract:
Methods and apparatus consistent with the invention provide the ability to organize and build understandings of machine data generated by a variety of information-processing environments. Machine data is a product of information-processing systems (e.g., activity logs, configuration files, messages, database records) and represents the evidence of particular events that have taken place and been recorded in raw data format. In one embodiment, machine data is turned into a machine data web by organizing machine data into events and then linking events together.
Abstract:
Embodiments are directed towards a system and method for a cloud-based front end that may abstract and enable access to the underlying cloud-hosted elements and objects that may be part of a multi-tenant application, such as a search application. Search objects may be employed to access indexed objects. An amount of indexed data accessible to a user may be based on an index storage limit selected by the user, such that data that exceeds the index storage limit may continue to be indexed. Also, one or more projects can be elastically scaled for a user to provide resources that may meet the specific needs of each project.
Abstract:
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
Abstract:
The disclosed embodiments relate to a system for monitoring a virtual-machine environment. During operation, the system identifies a parent and a set of two or more child components that are related to the parent component in the virtual-machine environment. Next, the system determines a performance metric for each child component in the set of two or more child components. The system then determines a child-component performance state for each child component in the set of two or more child components based on the performance metric for the child component and a child-component state criterion. Finally, the system determines a parent state for the parent component based on the child-component performance state for each child component in the set of two or more child components and a parent-component state criterion, wherein the parent-component state criterion includes a threshold percentage or number of child components that have a specified state.