-
公开(公告)号:US11138246B1
公开(公告)日:2021-10-05
申请号:US15194339
申请日:2016-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Robert Mark Waugh
Abstract: Techniques for searching a corpus of textual data using probabilistic data structures are described herein. The corpus of textual data is indexed using the probabilistic data structure on a piece-by-piece basis and the pieces are combined so that the textual data can be searched. The search results are returned, indicating a likelihood that the data item is in the textual data.
-
公开(公告)号:US10235417B1
公开(公告)日:2019-03-19
申请号:US14843850
申请日:2015-09-02
Applicant: Amazon Technologies, Inc.
Inventor: Greg Sterin , Daniel Vassallo , Robert Mark Waugh , Emmanuel Pierre Devillard , Nitin Kesarwani , Hongqi Wang , Sheikh Naveed Zafar
IPC: G06F17/30
Abstract: A technology is provided for enabling a partitioned search to be performed on log events from multiple log streams that are stored by multiple hosts. A search query may be submitted to identify the log streams whose log events are to be searched and to indicate a time interval in which log events are to have occurred as indicated by the log events' time stamps. The multiple hosts may search stored log events in parallel and return a set of log-event search results satisfying the search query. A pagination token can be included with the set of log event search results. The pagination token may be used to resume the search if the multiple hosts were not able to completely finish searching the stored log events before the set of log-event search results had to be returned to prevent a timeout of a search client.
-
公开(公告)号:US10756948B1
公开(公告)日:2020-08-25
申请号:US14830575
申请日:2015-08-19
Applicant: Amazon Technologies, Inc.
Inventor: Robert Mark Waugh , Emmanuel Pierre Devillard , Daniel Vassallo , Nitin Kesarwani , Greg Sterin , Hongqi Wang
Abstract: A leader host obtains individual distributions of data sets ingested by individual hosts of a fleet of hosts over a domain. The leader host compiles the individual distributions over the domain to generate a compiled distribution. The leader host then partitions the domain based at least in part on the generated compiled distribution. These partitions of the partitioned domain are distributed to individual hosts of the fleet of hosts, which causes the individual hosts to process a portion of the distributed date set according to their respective partitions.
-
公开(公告)号:US10331722B1
公开(公告)日:2019-06-25
申请号:US15607162
申请日:2017-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Bhawana Goel , Wei Huang , Robert Mark Waugh
Abstract: A dynamic clustering algorithm is used to process log data to generate pattern information. A word frequency map may be generated and/or updated based at least in part on entries of the log data. The word frequency map may indicate occurrences of words in the log data. In addition a modified word frequency map may be determined based at least in part on the frequency of adjacent words as indicated in the word frequency map. Based at least in part on the modified word frequency map a line threshold is determined. The line threshold indicating a common frequency indicated in the modified word frequency map. The line threshold may then be used to generate a pattern for an entry of the log data.
-
公开(公告)号:US10853359B1
公开(公告)日:2020-12-01
申请号:US14977497
申请日:2015-12-21
Applicant: Amazon Technologies, Inc.
Inventor: Robert Mark Waugh , Greg Sterin
IPC: G06F16/24 , G06F16/245 , G06F16/951 , G06F16/18
Abstract: A computing resource monitoring service receives a request to obtain data for various computing resources. The service obtains, from the various computing resources, one or more data log streams that include the requested data. The service utilizes the one or more data log streams to generate a probabilistic data structure that can be used to indicate that data log streams have been processed. If the one or more data log streams are not completely processed prior to the end of an allotted time period for processing of the request, the service generates a token that specifies partially processed data log streams and the probabilistic data structure. The token can be used to enable resumption of processing of the request.
-
公开(公告)号:US10178021B1
公开(公告)日:2019-01-08
申请号:US14981646
申请日:2015-12-28
Applicant: Amazon Technologies, Inc.
Inventor: Emmanuel Pierre Devillard , Daniel Vassallo , Nitin Kesarwani , Robert Mark Waugh
IPC: G06F15/173 , H04L12/715 , H04L12/24 , H04L12/707
Abstract: Systems and methods are provided for organizing data channels and processing hosts included in a system into clusters. A cluster management service may receive data from a steam of data and may route the data to a cluster associated with the data stream. A data channel routing service included in the cluster may route the data to the set of processing hosts included in the cluster through multiple data channels included in the cluster. In some instances, the data channel routing service may use any of the data channels to send data to the set of processing hosts. Because incoming data may be distributed among multiple data channels, the cluster may experience less congestion. Further, the system may also process the stream of data using the same processing hosts by routing the stream of data to the same cluster, thereby avoiding split processing of the data stream.
-
公开(公告)号:US20190007393A1
公开(公告)日:2019-01-03
申请号:US16127091
申请日:2018-09-10
Applicant: Amazon Technologies, Inc.
Inventor: Robert Mark Waugh , Daniel Vassallo
Abstract: A record storage system maintains an interdependent series of hash values for records submitted to the record storage service by one or more clients. The record storage service generates a hash value for each record based at least in part on the content of the record and a hash value of one or more previous records. In some examples, the generated hash values are saved in an audit database by the clients. Clients may retain some, all, or none of the hash values based on the amount of auditing desired and the amount of storage space available in the audit database. The clients are able to verify the integrity of records submitted to the record storage system by retrieving the records from the system, recalculating the hash values of the records, and comparing the recalculated hash values to the hash values retained by the client.
-
公开(公告)号:US10075425B1
公开(公告)日:2018-09-11
申请号:US15249136
申请日:2016-08-26
Applicant: Amazon Technologies, Inc.
Inventor: Robert Mark Waugh , Daniel Vassallo
CPC classification number: H04L63/123 , H04L9/3239 , H04L63/1425 , H04L2209/38
Abstract: A logging service maintains an interdependent series of hash values for log entries submitted to the logging service by one or more clients. The logging service generates a hash value for each log entry based at least in part on the content of the log entry and a hash value of one or more previous log entries. The generated hash values are saved in an audit database by the clients. Clients may retain some, all, or none of the hash values based at least in part on the amount of auditing desired and the amount of storage space available in the audit database. The clients are able to verify the integrity of log entries submitted to the logging service retrieving the log entries from the logging service, recalculating the hash values, and comparing the recalculated hash values to the hash values in the audit database.
-
公开(公告)号:US11295224B1
公开(公告)日:2022-04-05
申请号:US15373369
申请日:2016-12-08
Applicant: Amazon Technologies, Inc.
Inventor: Wei Huang , Nitin Kesarwani , Robert Mark Waugh , Hasan Nuzhet Atay
Abstract: A method includes obtaining time series data for a usage or performance metric for computing resources in a service provider network comprising a plurality of observations recorded in a plurality of respective time steps. A prediction error is determined for a previous prediction of an observation in the time series data. The prediction error is used to update a standard deviation of a set of predication errors for the usage or performance metric. The standard deviation and the prediction error are then used to update a confidence coefficient. A prediction limit for the usage or performance metric is then determined based on an expected value, the confidence coefficient, and the standard deviation. One or more events may be generated based on the prediction limit, which may be used to trigger a reconfiguration or auto-scaling of the computing resources.
-
公开(公告)号:US10904264B2
公开(公告)日:2021-01-26
申请号:US16127091
申请日:2018-09-10
Applicant: Amazon Technologies, Inc.
Inventor: Robert Mark Waugh , Daniel Vassallo
Abstract: A record storage system maintains an interdependent series of hash values for records submitted to the record storage service by one or more clients. The record storage service generates a hash value for each record based at least in part on the content of the record and a hash value of one or more previous records. In some examples, the generated hash values are saved in an audit database by the clients. Clients may retain some, all, or none of the hash values based on the amount of auditing desired and the amount of storage space available in the audit database. The clients are able to verify the integrity of records submitted to the record storage system by retrieving the records from the system, recalculating the hash values of the records, and comparing the recalculated hash values to the hash values retained by the client.
-
-
-
-
-
-
-
-
-