-
公开(公告)号:US11762803B2
公开(公告)日:2023-09-19
申请号:US17659642
申请日:2022-04-18
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
CPC classification number: G06F15/8046 , G06F7/505 , G06F7/53 , G06F7/5443 , G06F9/3001 , G06F9/3893
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US20230010054A1
公开(公告)日:2023-01-12
申请号:US17932537
申请日:2022-09-15
Applicant: Amazon Technologies, Inc.
Inventor: Thomas Elmer
Abstract: Systems and methods are provided to perform multiply-accumulate operations of at least one normalized number in a systolic array. The systolic array can obtain a first input and detect that the first input is denormal. Based on determining the first input is denormal, the systolic array can generate a first normalized number by normalizing the first input. Processing elements of the systolic array can include a multiplier and an adder. The multiplier can multiply the first normalized number by a second normal or normalized number to generate a multiplier product and the adder can add an input partial sum to the multiplier product to generate an addition result.
-
公开(公告)号:US11422773B1
公开(公告)日:2022-08-23
申请号:US16915937
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Thomas Elmer , Kiran K Seshadri
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US12182064B2
公开(公告)日:2024-12-31
申请号:US18446357
申请日:2023-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US12067375B2
公开(公告)日:2024-08-20
申请号:US17932537
申请日:2022-09-15
Applicant: Amazon Technologies, Inc.
Inventor: Thomas Elmer
CPC classification number: G06F7/5443 , G06F15/8046
Abstract: Systems and methods are provided to perform multiply-accumulate operations of at least one normalized number in a systolic array. The systolic array can obtain a first input and detect that the first input is denormal. Based on determining the first input is denormal, the systolic array can generate a first normalized number by normalizing the first input. Processing elements of the systolic array can include a multiplier and an adder. The multiplier can multiply the first normalized number by a second normal or normalized number to generate a multiplier product and the adder can add an input partial sum to the multiplier product to generate an addition result.
-
公开(公告)号:US11816446B2
公开(公告)日:2023-11-14
申请号:US16698838
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Thomas Elmer , Thomas A. Volpe
CPC classification number: G06F7/5443 , G06F15/8046
Abstract: Systems and methods are provided to perform multiply-accumulate operations of multiple data types in a systolic array. One or more processing elements in the systolic array can include a shared multiplier and one or more adders. The shared multiplier can include a separate and/or a shared circuitry where the shared circuitry can perform at least a part of integer multiplication and at least a part of non-integer multiplication. The one or more adders can include one or more shared adders or one or more separate adders. The shared adder can include a separate and/or a shared circuitry where the shared circuitry can perform at least a part of integer addition and at least a part of non-integer addition.
-
公开(公告)号:US11467806B2
公开(公告)日:2022-10-11
申请号:US16698809
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Thomas Elmer
Abstract: Systems and methods are provided to perform multiply-accumulate operations of normalized numbers in a systolic array to enable greater computational density, reduce the size of systolic arrays required to perform multiply-accumulate operations of normalized numbers, and/or enable higher throughput operation. The systolic array can be provided normalized numbers by a column of normalizers and can lack support for denormal numbers. Each normalizer can normalize the inputs to each processing element in the systolic array. The systolic array can include a multiplier and an adder. The multiplier can have multiple data paths that correspond to the data type of the input. The multiplier and adder can employ expanded exponent range to operate on normalized floating-point numbers and can lack support for denormal numbers.
-
公开(公告)号:US11423313B1
公开(公告)日:2022-08-23
申请号:US16218082
申请日:2018-12-12
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni , Mohammad El-Shabani , Kenneth Wayne Patton , Thomas Elmer
Abstract: Methods and systems for performing hardware approximation of function are provided. In one example, a system comprises a controller, configurable arithmetic circuits, and a mapping table. The mapping table stores a first set of function parameters in a first mode of operation and stores a second set of function parameters in a second mode of operation. Depending on the mode of operation, the controller may configure the arithmetic circuits to compute a first approximation result of a function at an input value based on the first set of function parameters, or to compute a second approximation result of the function at the input value based on the second set of function parameters and to perform post-processing, such as quantization, of the second approximation result.
-
公开(公告)号:US11232062B1
公开(公告)日:2022-01-25
申请号:US16915783
申请日:2020-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Thomas A Volpe , Sundeep Amirineni , Thomas Elmer
Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given columnar bus can receive an input from a prior element of the given columnar bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
-
公开(公告)号:US20200293284A1
公开(公告)日:2020-09-17
申请号:US16891010
申请日:2020-06-02
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Randy Huang , Ron Diamant , Thomas Elmer , Sundeep Amirineni
Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.
-
-
-
-
-
-
-
-
-