-
公开(公告)号:US11762803B2
公开(公告)日:2023-09-19
申请号:US17659642
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
CPC classification number: G06F15/8046 , G06F7/505 , G06F7/53 , G06F7/5443 , G06F9/3001 , G06F9/3893
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
2.
公开(公告)号:US11546336B1
公开(公告)日:2023-01-03
申请号:US16660715
申请日:2019-10-22
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Mark Banse
IPC: H04L9/40 , G06F16/9035 , G06F30/34
Abstract: Access control lookups may be implemented that support user-configurable and host-configurable processing stages. A request may be received and evaluated to determine whether bypass of user-configured access request processing stages should be bypassed. A lookup may be determined for user-configured access controlled decisions, and the access control decisions can be applied, if not bypassed. A lookup may be determined for a host-configured access control decisions and the access control decisions applied.
-
公开(公告)号:US11422773B1
公开(公告)日:2022-08-23
申请号:US16915937
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Thomas Elmer , Kiran K Seshadri
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US12197308B1
公开(公告)日:2025-01-14
申请号:US17091961
申请日:2020-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Ron Diamant
Abstract: On-circuit utilization monitoring may be performed for a systolic array. A current utilization measurement may be determined for processing elements of a systolic array and compared with a prior utilization measurement. Based on the comparison, a throttling recommendation may be provided to a management component to determine whether to perform the throttling recommendation.
-
公开(公告)号:US12210940B1
公开(公告)日:2025-01-28
申请号:US17091853
申请日:2020-11-06
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Thomas A Volpe
Abstract: On-circuit activity monitoring may be performed to modify integrated circuit processing. An activity monitor may be implemented on an integrated circuit to monitor activity measurements of processing data at another portion of the integrated circuit. A change to activity measurements may be detected and cause the activity monitor to modify the rate at which data enters the other portion of the integrated circuit for processing.
-
公开(公告)号:US11880682B2
公开(公告)日:2024-01-23
申请号:US17363894
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
CPC classification number: G06F9/3001 , G06F15/8046
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
-
公开(公告)号:US11308026B1
公开(公告)日:2022-04-19
申请号:US16915777
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Vasanta Kumar Palisetti , Thomas Elmer , Kiran K Seshadri , FNU Arun Kumar
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus, and perform arithmetic operations on the input. Each processing element can generate an output partial sum based on the arithmetic operations, provide the input to a next processing element of the given row-oriented bus, without the input being processed by a processing element of the row located between the two processing elements that uses a different row-oriented bus. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US20230004384A1
公开(公告)日:2023-01-05
申请号:US17363894
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
-
公开(公告)号:US20220350775A1
公开(公告)日:2022-11-03
申请号:US17659642
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US11467983B1
公开(公告)日:2022-10-11
申请号:US16660704
申请日:2019-10-22
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Mark Banse
Abstract: Access control request parameter interleaving may be implemented that supports user-configurable and host-configurable processing stages. A request may be received and evaluated to determine whether user-configured interleaving, host-configured interleaving, or both user-interleaving and host-interleaving are applied. For applied interleaving, two different portions of a request parameter may be swapped.
-
-
-
-
-
-
-
-
-