-
公开(公告)号:US10528533B2
公开(公告)日:2020-01-07
申请号:US15428523
申请日:2017-02-09
Applicant: Adobe Inc.
Inventor: Shiv Kumar Saini , Trevor Paulsen , Moumita Sinha , Gaurush Hiranandani
IPC: G06F16/00 , G06F16/215
Abstract: Techniques are disclosed for identifying anomalies in small data sets, by identifying anomalies using a Generalized Extreme Student Deviate test (GESD test). In an embodiment, a data set, such as business data or a website metric, is checked for skewness and, if found to be skewed, is transformed to a normal distribution (e.g., by applying a Box-Cox transformation). The data set is checked for presence of trends and, if a trend is found, has the trend removed (e.g., by running a linear regression). In one embodiment, a maximum number of anomalies is estimated for the data set, by applying an adjusted box plot to the data set. The data set and the estimated number of anomalies is run through a GESD test, and the test identifies anomalous data points in the data set, based on the provided estimated number of anomalies. In an embodiment, a confidence interval is generated for the identified anomalies.