-
公开(公告)号:US20240020514A1
公开(公告)日:2024-01-18
申请号:US18143970
申请日:2023-05-05
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Richard John Heaton , Andrea Olgiati , Ron Diamant
IPC: G06N3/045 , G06N3/04 , G06N3/08 , G06F18/214
CPC classification number: G06N3/045 , G06N3/04 , G06N3/08 , G06F18/214
Abstract: Systems and methods for performing improper input data detection are described. In one example, a system comprises: hardware circuits configured to receive input data and to perform computations of a neural network based on the input data to generate computation outputs; and an improper input detection circuit configured to: determine a relationship between the computation outputs of the hardware circuits and reference outputs; determine that the input data are improper based on the relationship; and perform an action based on determining that the input data are improper.
-
公开(公告)号:US11868875B1
公开(公告)日:2024-01-09
申请号:US16127170
申请日:2018-09-10
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Randy Renfu Huang , Jeffrey T. Huynh , Sundeep Amirineni
Abstract: Provided are systems and methods for operating a neural network processor, wherein the processor includes an input selector circuit that can be configured to select the data that will be input into the processor's computational array. In various implementations, the selector circuit can determine, for a row of the array, whether the row input will be the output from a buffer memory or data that the input selector circuit has selected for a different row. The row can receive an input feature map from a set of input data or an input feature map that was selected for inputting into a different row, such that the input feature map is input into more than one row at a time. The selector circuit can also include a delay circuit, so that the duplicated input feature map can be input into the computational array later than the original input feature map.
-
公开(公告)号:US11841792B1
公开(公告)日:2023-12-12
申请号:US16707893
申请日:2019-12-09
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant
CPC classification number: G06F12/0207 , G06F9/3016 , G06F9/30181 , G06F2212/251
Abstract: In one example, a hardware accelerator comprises: a programmable hardware instruction decoder programmed to store a plurality of opcodes; a programmable instruction schema mapping table implemented as a content addressable memory (CAM) and programmed to map the plurality of opcodes to a plurality of definitions of operands in a plurality of instructions; a hardware execution engine; and a controller configured to: receive an instruction that includes a first opcode of the plurality of opcodes; control the hardware instruction decoder to extract the first opcode from the instruction; obtain, from the instruction schema mapping table and based on the first opcode, a first definition of a first operand; and forward the instruction and the first definition to the hardware execution engine to control the hardware execution engine to extract the first operand from the instruction based on the first definition, and execute the instruction based on the first operand.
-
公开(公告)号:US11803736B1
公开(公告)日:2023-10-31
申请号:US16917015
申请日:2020-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
CPC classification number: G06N3/063 , G06F7/5443 , G06F9/3893 , G06F17/16 , G06F2207/4824
Abstract: A systolic array can implement an architecture tailored to perform matrix multiplications on constrained fine-grained sparse weight matrices. Each processing element in the systolic array may include a weight register configured to store a weight value, and a multiplexor configured to select a feature map (FMAP) input element from multiple FMAP input data buses based on metadata associated with the weight value. Each processing element may also include a multiplier configured to multiply the selected feature map input element with the weight value to generate a multiplication result, and an adder configured to add the multiplication result to a partial sum input to generate a partial sum output.
-
公开(公告)号:US20230334294A1
公开(公告)日:2023-10-19
申请号:US18339954
申请日:2023-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Randy Huang , Ron Diamant
CPC classification number: G06N3/045 , G06F15/80 , G06F13/28 , G06F3/0683 , G06F3/061 , G06F3/065 , G06F13/4068
Abstract: Provided are systems, methods, and integrated circuits for neural network processing. In various implementations, an integrated circuit for neural network processing can include a plurality of memory banks storing weight values for a neural network. The memory banks can be on the same chip as an array of processing engines. Upon receiving input data, the circuit can be configured to use the set of weight values to perform a task defined for the neural network. Performing the task can include reading weight values from the memory banks, inputting the weight values into the array of processing engines, and computing a result using the array of processing engines, where the result corresponds to an outcome of performing the task.
-
公开(公告)号:US20230325348A1
公开(公告)日:2023-10-12
申请号:US18210202
申请日:2023-06-15
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant
CPC classification number: G06F15/8046 , G06N3/02 , G06F17/16 , G06N3/063 , G06F15/173 , G06F17/15 , G06N3/045
Abstract: A processing element (PE) of a systolic array can perform neural networks computations on two or more data elements of an input data set using the same weight. Thus, two or more output data elements corresponding to an output data set may be generated. Based on the size of the input data set and an input data type, the systolic array can process a single data element or multiple data elements in parallel.
-
公开(公告)号:US11775430B1
公开(公告)日:2023-10-03
申请号:US17000842
申请日:2020-08-24
Applicant: Amazon Technologies, Inc.
Inventor: Ron Diamant , Sundeep Amirineni , Akshay Balasubramanian , Eyal Freund
IPC: G06F12/08 , G11C11/419 , G11C11/418 , G06N3/063
CPC classification number: G06F12/08 , G06N3/063 , G11C11/418 , G11C11/419
Abstract: Disclosed herein are techniques for performing memory access. In one embodiment, an integrated circuit includes a port and an access engine. The integrated circuit is coupled with a memory device. The access engine is configured to: receive, from an access requester device, a request to access data stored at a memory device; and based on receiving the request: provide, via the port, a sequential access of a plurality of portions of the data to the access requester device; and access the plurality of portions of the data in a parallel form at the memory device for the access requester device. The sequential access can include a sequential write access or a sequential read access of the plurality of portions of the data.
-
公开(公告)号:US11568238B2
公开(公告)日:2023-01-31
申请号:US16456414
申请日:2019-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, and dividing the tensor operation into sub-operations. The sub-operations includes at least two sub-operations that have no data dependency between the two sub-operations. The computer-implemented method further includes assigning a first sub-operation in the two sub-operations to a first computing engine, assigning a second sub-operation in the two sub-operations to a second computing engine, and generating instructions for performing, in parallel, the first sub-operation by the first computing engine and the second sub-operation by the second computing engine. An inference is then made based on a result of the first sub-operation, a result of the second sub-operation, or both. The first computing engine and the second computing engine are in a same integrated circuit device or in two different integrated circuit devices.
-
公开(公告)号:US11501145B1
公开(公告)日:2022-11-15
申请号:US16573201
申请日:2019-09-17
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant
Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.
-
公开(公告)号:US11475306B2
公开(公告)日:2022-10-18
申请号:US15933201
申请日:2018-03-22
Applicant: Amazon Technologies, Inc.
Inventor: Dana Michelle Vantrease , Ron Diamant , Thomas A. Volpe , Randy Huang
Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.
-
-
-
-
-
-
-
-
-