ADVANCED FIELD EXTRACTOR WITH MULTIPLE POSITIVE EXAMPLES
    11.
    发明申请
    ADVANCED FIELD EXTRACTOR WITH MULTIPLE POSITIVE EXAMPLES 有权
    具有多个积极实例的先进场提取器

    公开(公告)号:US20150149879A1

    公开(公告)日:2015-05-28

    申请号:US14610668

    申请日:2015-01-30

    Applicant: Splunk Inc.

    CPC classification number: G06F17/243 G06F17/30551

    Abstract: The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.

    Abstract translation: 所公开的技术涉及制定和提炼在查询时使用具有后期绑定模式的原始数据的字段提取规则。 字段提取规则识别原始数据的部分,以及它们的数据类型和层次关系。 这些提取规则是针对未组织成尚未通过标准提取或转换方法处理的关系结构的非常大的数据集执行的。 通过使用示例事件,关注主要和次要示例事件有助于制定跨多个数据格式的单个提取规则,或者针对不同格式的多个规则。 选择工具标记示例事件以指示提取规则的正例,并确定负面示例以避免错误的值选择。 提取规则可以保存以供查询时间使用,并且可以被并入事件数据的集合和子集的数据模型中。

    Sampling of events to use for developing a field-extraction rule for a field to use in event searching
    12.
    发明授权
    Sampling of events to use for developing a field-extraction rule for a field to use in event searching 有权
    对事件进行抽样以用于开发用于事件搜索的字段的字段提取规则

    公开(公告)号:US09031955B2

    公开(公告)日:2015-05-12

    申请号:US14168888

    申请日:2014-01-30

    Applicant: Splunk Inc.

    Abstract: Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.

    Abstract translation: 实施例旨在从包括非结构化数据的较大数据集生成代表性采样作为子集。 图形用户界面使得用户能够提供各种数据选择参数,包括指定数据源和期望的一个或多个子集类型,包括最新记录,最早记录,不同记录,离群记录和/或随机记录中的一个或多个。 可以通过从从较大数据集获得的记录的初始选择生成聚类来获得不同的和/或离群子集类型。 执行迭代分析以确定是否已经生成了超过至少一个阈值的足够数量的集群和/或集群类型,并且当不超过时,对附加记录执行附加集群。 从所得到的集群和/或其他子类型结果中,获得记录的子集作为代表性抽样子集。

    Variable representative sampling under resource constraints
    13.
    发明授权
    Variable representative sampling under resource constraints 有权
    资源约束下的可变代数抽样

    公开(公告)号:US08751499B1

    公开(公告)日:2014-06-10

    申请号:US13747153

    申请日:2013-01-22

    Applicant: Splunk Inc.

    Abstract: Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.

    Abstract translation: 实施例旨在从包括非结构化数据的较大数据集生成代表性采样作为子集。 图形用户界面使得用户能够提供各种数据选择参数,包括指定数据源和期望的一个或多个子集类型,包括最新记录,最早记录,不同记录,离群记录和/或随机记录中的一个或多个。 可以通过从从较大数据集获得的记录的初始选择生成聚类来获得不同的和/或离群子集类型。 执行迭代分析以确定是否已经生成了超过至少一个阈值的足够数量的集群和/或集群类型,并且当不超过时,对附加记录执行附加集群。 从所得到的集群和/或其他子类型结果中,获得记录的子集作为代表性抽样子集。

    Event selection via graphical user interface control

    公开(公告)号:US11651149B1

    公开(公告)日:2023-05-16

    申请号:US17874046

    申请日:2022-07-26

    Applicant: SPLUNK Inc.

    CPC classification number: G06F40/174 G06F16/2477

    Abstract: The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.

    Selection of a representative data subset of a set of unstructured data

    公开(公告)号:US11232124B2

    公开(公告)日:2022-01-25

    申请号:US16751063

    申请日:2020-01-23

    Applicant: SPLUNK INC.

    Abstract: Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.

    Automated extraction rule generation using a timestamp selector

    公开(公告)号:US11106691B2

    公开(公告)日:2021-08-31

    申请号:US16394754

    申请日:2019-04-25

    Applicant: SPLUNK INC.

    Abstract: Embodiments are directed towards a graphical user interface identify locations within event records with splittable timestamp information. A display of event records is provided using any of a variety of formats. A splittable timestamp selector allows a user to select one or more locations within event records as having time related information that may be split across the one or more locations, including, information based on date, time of day, day of the week, or other time information. Any of a plurality of mechanisms is used to associate the selected locations with the split timestamp information, including tags, labels, or header information within the event records. In other embodiments, a separate table, list, index, or the like may be generated that associates the selected locations with the split timestamp information. The split timestamp information may be used within extraction rules for selecting subsets or the event records.

    Determining an extraction rule from positive and negative examples

    公开(公告)号:US11042697B2

    公开(公告)日:2021-06-22

    申请号:US16589445

    申请日:2019-10-01

    Applicant: SPLUNK INC.

    Abstract: The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.

    Determining events associated with a value

    公开(公告)号:US10579648B2

    公开(公告)日:2020-03-03

    申请号:US15582668

    申请日:2017-04-29

    Applicant: SPLUNK, Inc.

    Abstract: Embodiments are directed towards real time display of event records and extracted values based on at least one extraction rule, such as a regular expression. A user interface may be employed to enable a user to have an extraction rule automatically generate and/or to manually enter an extraction rule. The user may be enabled to manually edit a previously provided extraction rule, which may result in real time display of updated extracted values. The extraction rule may be utilized to extract values from each of a plurality of records, including event records of unstructured machine data. Statistics may be determined for each unique extracted value, and may be displayed to the user in real time. The user interface may also enable the user to select at least one unique extracted value to display those event records that include an extracted value that matches the selected value.

Patent Agency Ranking