APPARATUS AND METHOD FOR A TENSOR PERMUTATION ENGINE

    公开(公告)号:US20210182059A1

    公开(公告)日:2021-06-17

    申请号:US17131424

    申请日:2020-12-22

    Inventor: Berkin AKIN

    Abstract: An apparatus and method for a tensor permutation engine. The TPE may include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in a first storage and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage. The TPE may include a shuffle register bank comprising a register to read tensor data elements from the plurality of read addresses generated by the read AGU, a first register bank to receive the tensor data elements, and a shift register to receive a lowest tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.

    Vector data transfer instruction
    57.
    发明授权

    公开(公告)号:US11003450B2

    公开(公告)日:2021-05-11

    申请号:US15759900

    申请日:2016-09-14

    Applicant: ARM LIMITED

    Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.

    Vector register access
    58.
    发明授权

    公开(公告)号:US10963251B2

    公开(公告)日:2021-03-30

    申请号:US16314882

    申请日:2017-06-15

    Applicant: ARM LIMITED

    Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.

    Two address translations from a single table look-aside buffer read

    公开(公告)号:US10901913B2

    公开(公告)日:2021-01-26

    申请号:US16251795

    申请日:2019-01-18

    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.

    METHOD AND DEVICE (UNIVERSAL MULTIFUNCTION ACCELERATOR) FOR ACCELERATING COMPUTATIONS BY PARALLEL COMPUTATIONS OF MIDDLE STRATUM OPERATIONS

    公开(公告)号:US20200334042A1

    公开(公告)日:2020-10-22

    申请号:US16795758

    申请日:2020-02-20

    Applicant: Venu KANDADAI

    Inventor: Venu KANDADAI

    Abstract: This invention constitutes a method and apparatus for enabling parallel computations of intermediate operations which are generic in many algorithms in given applications and also contain most of the computationally intensive operations. The method includes designing a set of intermediate level functions suitable for predefined application, obtaining instructions corresponding to intermediate level operations from a processor, computing the addresses of the operands and the results, performing computations involved in multiple intermediate level operations. In an exemplary embodiment the apparatus consists of a local data address generator that computes the addresses of a plurality of operands and results, a programmable computational unit that performs parallels computations of the intermediate level operations and a local memory interface that is interfaced to local memory organized in multiple blocks. The local data address generator and programmable computational unit are configurable to cover any field requiring large computations.

Patent Agency Ranking