PROVIDING FLEXIBLE MATRIX PROCESSORS FOR PERFORMING NEURAL NETWORK CONVOLUTION IN MATRIX-PROCESSOR-BASED DEVICES

    公开(公告)号:US20190065942A1

    公开(公告)日:2019-02-28

    申请号:US16117952

    申请日:2018-08-30

    摘要: Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices is disclosed. In this regard, a matrix-processor-based device provides a central processing unit (CPU) and a matrix processor. The matrix processor reorganizes a plurality of weight matrices and a plurality of input matrices into swizzled weight matrices and swizzled input matrices, respectively, that have regular dimensions natively supported by the matrix processor. The matrix-processor-based device then performs a convolution operation using the matrix processor to perform matrix multiplication/accumulation operations for the regular dimensions of the weight matrices and the input matrices, and further uses the CPU to execute instructions for handling the irregular dimensions of the weight matrices and the input matrices (e.g., by executing a series of nested loops, as a non-limiting example). The matrix-processor-based device thus provides efficient hardware acceleration by taking advantage of dimensional regularity, while maintaining the flexibility to handle different variations of convolution.

    PROVIDING EFFICIENT MULTIPLICATION OF SPARSE MATRICES IN MATRIX-PROCESSOR-BASED DEVICES

    公开(公告)号:US20190065150A1

    公开(公告)日:2019-02-28

    申请号:US16118162

    申请日:2018-08-30

    IPC分类号: G06F7/544 G06F15/80

    摘要: Providing efficient multiplication of sparse matrices in matrix-processor-based devices is disclosed herein. In one aspect, a matrix processor of a matrix-processor-based device includes a plurality of sequencers coupled to a plurality of multiply/accumulate (MAC) units for performing multiplication and accumulation operations. Each sequencer determines whether a product of an element of a first input matrix to be multiplied with an element of a second input matrix has a value of zero (e.g., by determining whether the element of the first input matrix has a value of zero, or by determining whether either the element of the first input matrix or that of the second input matrix has a value of zero). If the product of the elements of the first input matrix and the second input matrix does not have a value of zero, the sequencer provides the elements to a MAC unit to perform a multiplication and accumulation operation.

    Providing scalable dynamic random access memory (DRAM) cache management using DRAM cache indicator caches

    公开(公告)号:US10176096B2

    公开(公告)日:2019-01-08

    申请号:US15228320

    申请日:2016-08-04

    摘要: Providing scalable dynamic random access memory (DRAM) cache management using DRAM cache indicator caches is provided. In one aspect, a DRAM cache management circuit is provided to manage access to a DRAM cache in high-bandwidth memory. The DRAM cache management circuit comprises a DRAM cache indicator cache, which stores master table entries that are read from a master table in a system memory DRAM and that contain DRAM cache indicators. The DRAM cache indicators enable the DRAM cache management circuit to determine whether a memory line in the system memory DRAM is cached in the DRAM cache of high-bandwidth memory, and, if so, in which way of the DRAM cache the memory line is stored. Based on the DRAM cache indicator cache, the DRAM cache management circuit may determine whether to employ the DRAM cache and/or the system memory DRAM to perform a memory access operation in an optimal manner.

    PROVIDING MEMORY BANDWIDTH COMPRESSION USING COMPRESSED MEMORY CONTROLLERS (CMCs) IN A CENTRAL PROCESSING UNIT (CPU)-BASED SYSTEM
    20.
    发明申请
    PROVIDING MEMORY BANDWIDTH COMPRESSION USING COMPRESSED MEMORY CONTROLLERS (CMCs) IN A CENTRAL PROCESSING UNIT (CPU)-BASED SYSTEM 审中-公开
    在中央处理单元(CPU)系统中使用压缩存储器控制器(CMC)提供存储带宽压缩

    公开(公告)号:US20150339239A1

    公开(公告)日:2015-11-26

    申请号:US14717552

    申请日:2015-05-20

    IPC分类号: G06F12/10 G06F12/08

    摘要: Providing memory bandwidth compression using compressed memory controllers (CMCs) in a central processing unit (CPU)-based system is disclosed. In this regard, in some aspects, a CMC is configured to receive a memory read request to a physical address in a system memory, and read a compression indicator (CI) for the physical address from a master directory and/or from error correcting code (ECC) bits of the physical address. Based on the CI, the CMC determines a number of memory blocks to be read for the memory read request, and reads the determined number of memory blocks. In some aspects, a CMC is configured to receive a memory write request to a physical address in the system memory, and generate a CI for write data based on a compression pattern of the write data. The CMC updates the master directory and/or the ECC bits of the physical address with the generated CI.

    摘要翻译: 在基于中央处理单元(CPU)的系统中,使用压缩存储器控制器(CMC)提供存储器带宽压缩。 在这方面,在某些方面,CMC被配置为向系统存储器中的物理地址接收存储器读取请求,并从主目录和/或从纠错码读取物理地址的压缩指示符(CI) (ECC)位的物理地址。 基于CI,CMC确定要为存储器读取请求读取的存储器块的数量,并读取确定的存储器块数。 在一些方面,CMC被配置为接收对系统存储器中的物理地址的存储器写入请求,并且基于写入数据的压缩模式生成用于写入数据的CI。 CMC使用生成的CI更新主目录和/或物理地址的ECC位。