-
公开(公告)号:US11960935B2
公开(公告)日:2024-04-16
申请号:US16020819
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
CPC classification number: G06F9/5027 , G06F8/65 , G06F9/45558 , G06N5/046 , G06N20/00 , G06T1/20 , G06F2009/4557 , G06F2009/45583 , G06F2009/45595
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes attaching a first set of one or more accelerator slots of an accelerator appliance to an application instance of a multi-tenant provider network according to an application instance configuration, the application instance configuration to define per accelerator slot capabilities to be used by an application of the application instance, wherein the multi-tenant provider network comprises a plurality of computing devices configured to implement a plurality of virtual compute instances, and wherein the first set of one or more accelerator slots is implemented using physical accelerator resources accessible to the application instance; while performing inference using the loaded machine learning model of the application using the first set of one or more accelerator slots on the attached accelerator appliance, managing resources of the accelerator appliance using an accelerator appliance manager of the accelerator appliance.
-
公开(公告)号:US11467835B1
公开(公告)日:2022-10-11
申请号:US16199129
申请日:2018-11-23
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Jalaja Kurubarahalli , Samuel Oshin , Cory Pruce , Jun Wu , Eftiquar Shaikh , Pragya Agarwal , David Thomas , Karan Kothari , Daniel Evans , Umang Wadhwa , Mark Klunder , Rahul Sharma , Zdravko Pantic , Dominic Rajeev Divakaruni , Andrea Olgiati , Leo Dirac , Nafea Bshara , Bratin Saha , Matthew Wood , Swaminathan Sivasubramanian , Rajankumar Singh
Abstract: Techniques for partitioning data flow operations between execution on a compute instance and an attached accelerator instance are described. A set of operations supported by the accelerator is obtained. A set of operations associated with the data flow is obtained. An operation in the set of operations associated with the data flow is identified based on the set of operations supported by the accelerator. The accelerator executes the first operation.
-
公开(公告)号:US11423283B1
公开(公告)日:2022-08-23
申请号:US15933114
申请日:2018-03-22
Applicant: Amazon Technologies, Inc.
Inventor: Hagay Lupesko , Dominic Rajeev Divakaruni , Jonathan Esterhazy , Sandeep Krishnamurthy , Vikram Madan , Roshani Nagmote , Naveen Mysore Nagendra Swamy , Yao Wang
Abstract: Techniques for model adaptation are described. For example, a method of receiving a call to provide either a model variant or a model variant profile of a deep learning model, the call including desired performance of the deep learning model, a deep learning model identifier, and current edge device characteristics; comparing the received current edge device characteristics to available model variants and profiles based on the desired performance of the deep learning model to generate or select a model variant or profile, the available model variants and profiles determined by the model identifier; and sending the generated or selected model variant or profile to the edge device to use in inference is detailed.
-
公开(公告)号:US11599821B2
公开(公告)日:2023-03-07
申请号:US16020776
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes receiving an application instance configuration, an application of the application instance to utilize a portion of an attached accelerator during execution of a machine learning model and the application instance configuration including: an indication of the central processing unit (CPU) capability to be used, an arithmetic precision of the machine learning model to be used, an indication of the accelerator capability to be used, a storage location of the application, and an indication of an amount of random access memory to use.
-
公开(公告)号:US11494621B2
公开(公告)日:2022-11-08
申请号:US16020788
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes receiving an application instance configuration, an application of the application instance to utilize a portion of an attached accelerator during execution of a machine learning model and the application instance configuration including an arithmetic precision of the machine learning model to be used in determining the portion of the accelerator to provision; provisioning the application instance and the portion of the accelerator attached to the application instance, wherein the application instance is implemented using a physical compute instance in a first location, wherein the portion of the accelerator is implemented using a physical accelerator in the second location; loading the machine learning model onto the portion of the accelerator; and performing inference using the loaded machine learning model of the application using the portion of the accelerator on the attached accelerator.
-
公开(公告)号:US11422863B2
公开(公告)日:2022-08-23
申请号:US16020810
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes provisioning an application instance and portions of at least one accelerator attached to the application instance to execute a machine learning model of an application of the application instance; loading the machine learning model onto the portions of the at least one accelerator; receiving scoring data in the application; and utilizing each of the portions of the attached at least one accelerator to perform inference on the scoring data in parallel and only using one response from the portions of the accelerator.
-
-
-
-
-