MULTI-MEMORY ON-CHIP COMPUTATIONAL NETWORK

    公开(公告)号:US20210019600A1

    公开(公告)日:2021-01-21

    申请号:US17033573

    申请日:2020-09-25

    Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.

    ACCELERATED QUANTIZED MULTIPLY-AND-ADD OPERATIONS

    公开(公告)号:US20200293284A1

    公开(公告)日:2020-09-17

    申请号:US16891010

    申请日:2020-06-02

    Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.

    SCHEDULING NETWORK COMPUTATIONS
    23.
    发明申请

    公开(公告)号:US20190294959A1

    公开(公告)日:2019-09-26

    申请号:US15933225

    申请日:2018-03-22

    Abstract: Disclosed herein are techniques for scheduling and executing multi-layer neural network computations for multiple contexts. In one embodiment, a method comprises determining a set of computation tasks to be executed, the set of computation tasks including a first computation task and a second computation task, as well as a third computation task and a fourth computation task to provide input data for the first and second computation tasks; determining a first execution batch comprising the first and second computation tasks; determining a second execution batch comprising at least the third computation task to be executed before the first execution batch; determining whether to include the fourth computation task in the second execution batch based on whether the memory device has sufficient capacity to hold input data and output data of both of the third and fourth computation; executing the second execution batch followed by the first execution batch.

    ACCELERATED QUANTIZED MULTIPLY-AND-ADD OPERATIONS

    公开(公告)号:US20190294413A1

    公开(公告)日:2019-09-26

    申请号:US15934681

    申请日:2018-03-23

    Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. A computer-implemented method includes receiving low-precision inputs for a convolution operation from a storage device, and subtracting a low-precision value representing a high-precision zero value from the low-precision inputs to generate difference values, where the low-precision inputs are asymmetrically quantized from high-precision inputs. The method also includes performing multiplication and summation operations on the difference values to generate a sum of products, and generating a high-precision output by scaling the sum of products with a scaling factor.

    ON-CHIP COMPUTATIONAL NETWORK
    25.
    发明申请

    公开(公告)号:US20190180183A1

    公开(公告)日:2019-06-13

    申请号:US15839017

    申请日:2017-12-12

    Abstract: Provided are systems, methods, and integrated circuits for neural network processing. In various implementations, an integrated circuit for neural network processing can include a plurality of memory banks storing weight values for a neural network. The memory banks can be on the same chip as an array of processing engines. Upon receiving input data, the circuit can be configured to use the set of weight values to perform a task defined for the neural network. Performing the task can include reading weight values from the memory banks, inputting the weight values into the array of processing engines, and computing a result using the array of processing engines, where the result corresponds to an outcome of performing the task.

    MULTI-MEMORY ON-CHIP COMPUTATIONAL NETWORK
    26.
    发明申请

    公开(公告)号:US20190180170A1

    公开(公告)日:2019-06-13

    申请号:US15839301

    申请日:2017-12-12

    Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.

    FAST CONTEXT SWITCHING FOR COMPUTATIONAL NETWORKS

    公开(公告)号:US20190179795A1

    公开(公告)日:2019-06-13

    申请号:US15839157

    申请日:2017-12-12

    Abstract: Provided are systems, methods, and integrated circuits neural network processor that can execute a fast context switch between one neural network and another. In various implementations, a neural network processor can include a plurality of memory banks storing a first set of weight values for a first neural network. When the neural network processor receives first input data, the neural network processor can compute a first result using the first set of weight values and the first input data. While computing the first result, the neural network processor can store, in the memory banks, a second set of weight values for a second neural network. When the neural network processor receives second input data, the neural network processor can compute a second result using the second set of weight values and the second input data, where the computation occurs upon completion of computation of the first result.

Patent Agency Ranking