Invention Grant
- Patent Title: Managing selection of a representative data subset according to user-specified parameters with clustering
-
Application No.: US15421406Application Date: 2017-01-31
-
Publication No.: US10585910B1Publication Date: 2020-03-10
- Inventor: R. David Carasso , Micah James Delfino
- Applicant: Splunk Inc.
- Applicant Address: US CA San Francisco
- Assignee: SPLUNK INC.
- Current Assignee: SPLUNK INC.
- Current Assignee Address: US CA San Francisco
- Agency: Perkins Coie LLP
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/25 ; G06F7/24 ; G06F3/0482 ; G06F16/28 ; G06F16/904 ; G06F3/0488

Abstract:
Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
Information query