COMPRESSION OF MACHINE LEARNING MODELS VIA SPARSIFICATION AND QUANTIZATION

    公开(公告)号:US20250094864A1

    公开(公告)日:2025-03-20

    申请号:US18602951

    申请日:2024-03-12

    Abstract: Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the size, computation, and latency of a machine learning model, a compression technique can be employed which includes model sparsification and quantization. To limit the extent to which the quality of the model is impacted when uniformly applying sparsification and quantization to all values of the model, the present disclosure provides for a hybrid sparsification and quantization of the model.

    PRUNING AND ACCELERATING NEURAL NETWORKS WITH HIERARCHICAL FINE-GRAINED STRUCTURED SPARSITY

    公开(公告)号:US20230062503A1

    公开(公告)日:2023-03-02

    申请号:US17681967

    申请日:2022-02-28

    Abstract: Hierarchical structured sparse parameter pruning and processing improves runtime performance and energy efficiency of neural networks. In contrast with conventional (non-structured) pruning which allows for any distribution of the non-zero values within a matrix that achieves the desired sparsity degree (e.g., 50%) and is consequently difficult to accelerate, structured hierarchical sparsity requires each multi-element unit at the coarsest granularity of the hierarchy to be pruned to the desired sparsity degree. The global desired sparsity degree is a function of the per-level sparsity degrees. Distribution of non-zero values within each multi-element unit is constrained according to the per-level sparsity degree at the particular level of the hierarchy. Each level of the hierarchy may be associated with a hardware (e.g., logic or circuit) structure that can be enabled or disabled according to the per-level sparsity. Hierarchical sparsity provides performance improvements for a greater variety of sparsity patterns, granularity, and sparsity degrees.

    GENERATING SPARSE NEURAL NETWORKS
    3.
    发明公开

    公开(公告)号:US20240152407A1

    公开(公告)日:2024-05-09

    申请号:US18222916

    申请日:2023-07-17

    CPC classification number: G06F9/5083 G06F7/5443

    Abstract: Apparatuses, systems, and techniques to determine a configuration based at least in part on data stored by at least one data structure of a workload at runtime, and transform the workload into a sparse workload based at least in part on the configuration. In at least one embodiment, one or more sparse workloads (e.g., one or more sparse neural networks) are generated based at least in part on, for example, one or more workloads (e.g., one or more neural networks).

Patent Agency Ranking