-
公开(公告)号:US20250094864A1
公开(公告)日:2025-03-20
申请号:US18602951
申请日:2024-03-12
Applicant: NVIDIA Corporation
Inventor: Po-An Tsai , Geonhwa Jeong , Jeffrey Michael Pool
IPC: G06N20/00
Abstract: Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the size, computation, and latency of a machine learning model, a compression technique can be employed which includes model sparsification and quantization. To limit the extent to which the quality of the model is impacted when uniformly applying sparsification and quantization to all values of the model, the present disclosure provides for a hybrid sparsification and quantization of the model.
-
2.
公开(公告)号:US20230062503A1
公开(公告)日:2023-03-02
申请号:US17681967
申请日:2022-02-28
Applicant: NVIDIA Corporation
Inventor: Yannan Wu , Po-An Tsai , Saurav Muralidharan , Joel Springer Emer
IPC: G06N3/08
Abstract: Hierarchical structured sparse parameter pruning and processing improves runtime performance and energy efficiency of neural networks. In contrast with conventional (non-structured) pruning which allows for any distribution of the non-zero values within a matrix that achieves the desired sparsity degree (e.g., 50%) and is consequently difficult to accelerate, structured hierarchical sparsity requires each multi-element unit at the coarsest granularity of the hierarchy to be pruned to the desired sparsity degree. The global desired sparsity degree is a function of the per-level sparsity degrees. Distribution of non-zero values within each multi-element unit is constrained according to the per-level sparsity degree at the particular level of the hierarchy. Each level of the hierarchy may be associated with a hardware (e.g., logic or circuit) structure that can be enabled or disabled according to the per-level sparsity. Hierarchical sparsity provides performance improvements for a greater variety of sparsity patterns, granularity, and sparsity degrees.
-
公开(公告)号:US20240152407A1
公开(公告)日:2024-05-09
申请号:US18222916
申请日:2023-07-17
Applicant: NVIDIA Corporation
Inventor: Geonhwa Jeong , Po-An Tsai , Jeffrey Michael Pool
CPC classification number: G06F9/5083 , G06F7/5443
Abstract: Apparatuses, systems, and techniques to determine a configuration based at least in part on data stored by at least one data structure of a workload at runtime, and transform the workload into a sparse workload based at least in part on the configuration. In at least one embodiment, one or more sparse workloads (e.g., one or more sparse neural networks) are generated based at least in part on, for example, one or more workloads (e.g., one or more neural networks).
-
-