PER KERNEL KMEANS COMPRESSION FOR NEURAL NETWORKS

    公开(公告)号:US20220027704A1

    公开(公告)日:2022-01-27

    申请号:US17366919

    申请日:2021-07-02

    Abstract: Methods and apparatus relating to techniques for incremental network quantization. In an example, an apparatus comprises logic, at least partially comprising hardware logic to determine a plurality of weights for a layer of a convolutional neural network (CNN) comprising a plurality of kernels; organize the plurality of weights into a plurality of clusters for the plurality of kernels; and apply a K-means compression algorithm to each of the plurality of clusters. Other embodiments are also disclosed and claimed.

    Per kernel Kmeans compression for neural networks

    公开(公告)号:US11055604B2

    公开(公告)日:2021-07-06

    申请号:US15702193

    申请日:2017-09-12

    Abstract: Methods and apparatus relating to techniques for incremental network quantization. In an example, an apparatus comprises logic, at least partially comprising hardware logic to determine a plurality of weights for a layer of a convolutional neural network (CNN) comprising a plurality of kernels; organize the plurality of weights into a plurality of clusters for the plurality of kernels; and apply a K-means compression algorithm to each of the plurality of clusters. Other embodiments are also disclosed and claimed.

    ONLINE ACTIVATION COMPRESSION WITH K-MEANS
    5.
    发明申请

    公开(公告)号:US20190102673A1

    公开(公告)日:2019-04-04

    申请号:US15720298

    申请日:2017-09-29

    Abstract: Methods and apparatus relating to online activation compression with K-means are described. In one embodiment, logic (e.g., in a processor) compresses one or more activation functions for a convolutional network based on non-uniform quantization. The non-uniform quantization for each layer of the convolutional network is performed offline, and an activation function for a specific layer of the convolutional network is quantized during runtime. Other embodiments are also disclosed and claimed.

Patent Agency Ranking