-
公开(公告)号:US10540606B2
公开(公告)日:2020-01-21
申请号:US14460314
申请日:2014-08-14
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US11327953B2
公开(公告)日:2022-05-10
申请号:US16692100
申请日:2019-11-22
Applicant: Amazon Technologies, Inc.
Inventor: Jon Arron McClintock , Brandon William Porter , Donghui Zhuo
IPC: G06F16/23
Abstract: Pattern based detection of data usage is facilitated using data injection. Data values are injected in one or more storage locations accessible to a plurality of services or included in service requests. Service interactions among the services are compared to a set of patterns. The set of patterns are configured to match the data values. By comparing the service interactions to the patterns, one or more of the service interactions are determined to include individual ones of the data values. Data are generated indicating a presence of the data values in the services.
-
公开(公告)号:US20200089669A1
公开(公告)日:2020-03-19
申请号:US16692100
申请日:2019-11-22
Applicant: Amazon Technologies, Inc.
Inventor: Jon Arron McClintock , Brandon William Porter , Donghui Zhuo
IPC: G06F16/23
Abstract: Pattern based detection of data usage is facilitated using data injection. Data values are injected in one or more storage locations accessible to a plurality of services or included in service requests. Service interactions among the services are compared to a set of patterns. The set of patterns are configured to match the data values. By comparing the service interactions to the patterns, one or more of the service interactions are determined to include individual ones of the data values. Data are generated indicating a presence of the data values in the services.
-
公开(公告)号:US11544623B2
公开(公告)日:2023-01-03
申请号:US16591521
申请日:2019-10-02
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US11100420B2
公开(公告)日:2021-08-24
申请号:US14460312
申请日:2014-08-14
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Rakesh Ramakrishnan , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation.
-
公开(公告)号:US20230126005A1
公开(公告)日:2023-04-27
申请号:US18146075
申请日:2022-12-23
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US20200034742A1
公开(公告)日:2020-01-30
申请号:US16591521
申请日:2019-10-02
Applicant: Amazon Technologies, Inc.
Inventor: Leo Parker Dirac , Jin Li , Tianming Zheng , Donghui Zhuo
IPC: G06N20/00
Abstract: Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
-
公开(公告)号:US10320632B1
公开(公告)日:2019-06-11
申请号:US14014042
申请日:2013-08-29
Applicant: Amazon Technologies, Inc.
Inventor: Jon Arron McClintock , Melissa Elaine Davis , Anton Vladilenovich Goldberg , Aram Grigoryan , Brandon William Porter , Matthew Paul Wenger , Donghui Zhuo
Abstract: Methods, systems, and computer-readable media for implementing pattern-based detection are disclosed. A plurality of services monitor a plurality of service interactions comprising data or metadata. The services compare the data or metadata to a set of patterns and identify one or more matched patterns among the set of patterns. The services send data indicative of the matched patterns to a central recording service. The central recording service aggregates the data indicative of the matched patterns and generates one or more data flow visualizations indicating one or more data flows between individual ones of the services for the matched patterns.
-
-
-
-
-
-
-