Dimension shuffling using matrix processors

    公开(公告)号:US10949496B2

    公开(公告)日:2021-03-16

    申请号:US15395906

    申请日:2016-12-30

    Abstract: In one embodiment, a matrix operation may be performed to reorder a plurality of dimensions of an input matrix stored in two-dimensional memory. Data associated with the input matrix may be accessed using one or more strided memory operations, wherein the one or more strided memory operations are configured to access the two-dimensional memory at a plurality of locations that are separated by a particular interval. The data accessed using the one or more strided memory operations may be stored in a result matrix, wherein the data accessed using each strided memory operation is stored in the result matrix in non-transpose form or transpose form.

    DISTRIBUTED MATRIX MULTIPLICATION FOR NEURAL NETWORKS

    公开(公告)号:US20190138569A1

    公开(公告)日:2019-05-09

    申请号:US16236955

    申请日:2018-12-31

    Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

    Programmable matrix processing engine

    公开(公告)号:US10228937B2

    公开(公告)日:2019-03-12

    申请号:US15395654

    申请日:2016-12-30

    Abstract: An apparatus may comprise a multi-dimensional memory, a plurality of matrix processors, and a matrix routine memory. The matrix routine memory may store a plurality of programmable matrix routines, wherein each programmable matrix routine comprises a plurality of instructions associated with a particular matrix operation, wherein the plurality of instructions is to be executed by the plurality of matrix processors. Further, the plurality of matrix processors may be configured to: receive a command to perform a matrix operation; receive matrix data from the multi-dimensional memory; extract one or more matrix operands from the matrix data; identify a programmable matrix routine associated with the matrix operation; receive the programmable matrix routine from the matrix routine memory; execute the programmable matrix routine using the one or more matrix operands; and obtain a result of the matrix operation based on execution of the programmable matrix routine.

    DISTRIBUTED MATRIX MULTIPLICATION FOR NEURAL NETWORKS

    公开(公告)号:US20180189236A1

    公开(公告)日:2018-07-05

    申请号:US15395527

    申请日:2016-12-30

    CPC classification number: G06F17/16 G06N3/084

    Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

    Pipelined convolutional operations for processing clusters

    公开(公告)号:US09886377B2

    公开(公告)日:2018-02-06

    申请号:US14874784

    申请日:2015-10-05

    CPC classification number: G06F12/023 G06F15/76 G06F2212/251 G06T1/20

    Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.

    Distributed convolution for neural networks

    公开(公告)号:US11748625B2

    公开(公告)日:2023-09-05

    申请号:US15395675

    申请日:2016-12-30

    CPC classification number: G06N3/084 G06F17/153 G06F17/16 G06N3/045 G06N3/063

    Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

    Distributed matrix multiplication for neural networks

    公开(公告)号:US10922380B2

    公开(公告)日:2021-02-16

    申请号:US16236955

    申请日:2018-12-31

    Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

    PROGRAMMABLE MATRIX PROCESSING ENGINE
    20.
    发明申请

    公开(公告)号:US20190171450A1

    公开(公告)日:2019-06-06

    申请号:US16264483

    申请日:2019-01-31

    Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.

Patent Agency Ranking