COMPRESSION FOR DEEP LEARNING IN CASE OF SPARSE VALUES MAPPED TO NON-ZERO VALUE

    公开(公告)号:US20190197420A1

    公开(公告)日:2019-06-27

    申请号:US15853457

    申请日:2017-12-22

    CPC classification number: G06N5/046 G06F13/28 G06F17/16 G06N20/00 G06T15/205

    Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.

    Binary multiplier for binary vector factorization

    公开(公告)号:US10210137B2

    公开(公告)日:2019-02-19

    申请号:US15635716

    申请日:2017-06-28

    Abstract: A processor, including: decode circuitry to decode instructions; a data cache unit including circuitry to cache data for the processor; and an approximate matrix multiplication (AMM) circuit including: a data receptor circuit to receive a weight vector w and an input vector x, both of size N, and a compression regulating parameter n; a factorizer circuit to factorize w into w≅B·s, by computing a binary factorized matrix B of size N×n, and a dictionary vector s of size n; and a binary multiplier circuit to compute w^T x≅(B·s)^T x=s^T(B^T x), the binary multiplier circuit comprising a hardware accelerator circuit to compute an array product B^T x).

    Binary Multiplier for Binary Vector Factorization

    公开(公告)号:US20190004997A1

    公开(公告)日:2019-01-03

    申请号:US15635716

    申请日:2017-06-28

    Abstract: A processor, including: decode circuitry to decode instructions; a data cache unit including circuitry to cache data for the processor; and an approximate matrix multiplication (AMM) circuit including: a data receptor circuit to receive a weight vector w and an input vector x, both of size N, and a compression regulating parameter n; a factorizer circuit to factorize w into w≅B·s, by computing a binary factorized matrix B of size N×n, and a dictionary vector s of size n; and a binary multiplier circuit to compute w∧T x≅(B·s)∧T x=s∧T (B)∧T x), the binary multiplier circuit comprising a hardware accelerator circuit to compute an array product B∧T x).

    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED HISTOGRAM FUNCTIONALITY
    36.
    发明申请
    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED HISTOGRAM FUNCTIONALITY 有权
    方法,装置,说明和逻辑提供矢量包装组织功能

    公开(公告)号:US20160378716A1

    公开(公告)日:2016-12-29

    申请号:US14752054

    申请日:2015-06-26

    Abstract: Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion.

    Abstract translation: 指令和逻辑提供SIMD矢量压缩直方图功能。 一些处理器实施例包括分别在寄存器通道部分的多个数据字段的每一个中分别存储第一和第二数据类型的对应元件的第一和第二寄存器。 解码级对SIMD矢量压缩直方图的指令进行解码。 一个或多个执行单元将第一注册通道部分中的第一数据类型的每个元素与指令指定的范围进行比较。 对于所述范围中的第一寄存器部分的任何元件,来自第二寄存器部分的第二数据类型的对应元件被添加到目的地寄存器通道部分的多个数据字段中的一个,根据其相应的值 元素,以产生每个目的地寄存器通道部分的压缩的直方图。

Patent Agency Ranking