Abstract:
Embodiments are directed towards automatically generating extraction rules for extracting fields from event records. An extraction rule application receives field data describing the fields to be extracted (including one or more examples) and a collection of event records that may be a representative sample set from a larger set of events records. The extraction rule application generates extraction rules based on the event records and the field data. These extraction rules may be ranked using a determined quality score. Quality scores for extraction rules may be determined based on various metrics related to the operation of the extraction rules and the resultant extracted values. Preferred extraction rules may be determined by ranking the extraction rules based on their quality scores. Also, natural language expressions may be used to create, edit, or modify extraction rules.
Abstract:
Embodiments are directed towards previewing results generated from indexing data raw data before the corresponding index data is added to an index store. Raw data may be received from a preview data source. After an initial set of configuration information may be established, the preview data may be submitted to an index processing pipeline. A previewing application may generate preview results based on the preview index data and the configuration information. The preview results may enable previewing how the data is being processed by the indexing application. If the preview results are not acceptable, the configuration information may be modified. The preview application enables modification of the configuration information until the generated preview results may be acceptable. If the configuration information is acceptable, the preview data may be processed and indexed in one or more index stores.
Abstract:
Embodiments are directed towards a graphical user interface to identify locations within event records with splittable timestamp information. A display of event records is provided using any of a variety of formats. A splittable timestamp selector allows a user to select one or more locations within event records as having time related information that may be split across the one or more locations, including, information based on date, time of day, day of the week, or other time information. Any of a plurality of mechanisms is used to associate the selected locations with the split timestamp information, including tags, labels, or header information within the event records. In other embodiments, a separate table, list, index, or the like may be generated that associates the selected locations with the split timestamp information. The split timestamp information may be used within extraction rules for selecting subsets of the event records.
Abstract:
Embodiments are directed towards automatically generating extraction rules for extracting fields from event records. An extraction rule application receives field data describing the fields to be extracted (including one or more examples) and a collection of event records that may be a representative sample set from a larger set of events records. The extraction rule application generates extraction rules based on the event records and the field data. These extraction rules may be ranked using a determined quality score. Quality scores for extraction rules may be determined based on various metrics related to the operation of the extraction rules and the resultant extracted values. Preferred extraction rules may be determined by ranking the extraction rules based on their quality scores. Also, natural language expressions may be used to create, edit, or modify extraction rules.
Abstract:
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.