-
公开(公告)号:US20240413839A1
公开(公告)日:2024-12-12
申请号:US18807785
申请日:2024-08-16
Applicant: QUALCOMM Incorporated
Inventor: Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN
IPC: H03M7/30 , G06F16/22 , G06N3/0495 , G06N3/08
Abstract: Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.
-
公开(公告)号:US20240118902A1
公开(公告)日:2024-04-11
申请号:US18339797
申请日:2023-06-22
Applicant: QUALCOMM Incorporated
Inventor: Eric Wayne MAHURIN , Erich PLONDKE , Hitesh Kumar GUPTA , Colin Beaton VERRILLI , Rexford Alan HILL
CPC classification number: G06F9/3887 , G06F9/30178
Abstract: An aspect of the disclosure relates to a data processing system, including: an input medium configured to include a first set of blocks of data including a first set of block of compressed data and a first set of metadata, respectively; an output medium configured to include a first set of blocks of decompressed data each having a predetermined number of decompressed elements; and a set of single instruction multiple data (SIMD) processors configured to: access the first set of blocks of data from the input medium, respectively; decompress the first set of blocks of compressed data to generate the first set of blocks of decompressed data based on the first set of metadata, respectively; and provide the first set of blocks of decompressed data to the output medium, respectively.
-
公开(公告)号:US20220284271A1
公开(公告)日:2022-09-08
申请号:US17194202
申请日:2021-03-05
Applicant: QUALCOMM Incorporated
Inventor: Hee Jun PARK , Colin Beaton VERRILLI
Abstract: A method for an artificial neural network includes receiving a set of input values to be convolved with multiple kernels via multiple computing units. One or more thermally-stressed computing units of the multiple computing units are determined. The multiple kernels are mapped to the multiple computing units of a system-on-chip (SOC) based on the one or more thermally-stressed computing units. A convolution is performed on the set of input values and a most sparse kernel of the multiple kernels on the most thermally-stressed computing unit.
-
公开(公告)号:US20210351789A1
公开(公告)日:2021-11-11
申请号:US16870873
申请日:2020-05-08
Applicant: QUALCOMM Incorporated
Inventor: Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN
Abstract: Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.
-
公开(公告)号:US20230223954A1
公开(公告)日:2023-07-13
申请号:US17997619
申请日:2021-05-07
Applicant: QUALCOMM Incorporated
Inventor: Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN
IPC: H03M7/30 , G06N3/0495
CPC classification number: H03M7/70 , H03M7/3066 , H03M7/6005 , G06N3/0495
Abstract: Techniques and apparatuses to decompress data that has been stack compressed is described. Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.
-
公开(公告)号:US20200250545A1
公开(公告)日:2020-08-06
申请号:US16783047
申请日:2020-02-05
Applicant: QUALCOMM Incorporated
Abstract: A method for accelerating machine learning on a computing device is described. The method includes hosting a neural network in a first inference accelerator and a second inference accelerator. The neural network split between the first inference accelerator and the second inference accelerator. The method also includes routing intermediate inference request results directly between the first inference accelerator and the second inference accelerator. The method further includes generating a final inference request result from the intermediate inference request results.
-
公开(公告)号:US20240411718A1
公开(公告)日:2024-12-12
申请号:US18333377
申请日:2023-06-12
Applicant: QUALCOMM Incorporated
Inventor: Sandeep PANDE , Satish SINGH , Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN , Vinay MURTHY
Abstract: A machine learning (ML)-accelerator system-on-chip (SoC) is described. The ML-accelerator SoC includes a set of ML-accelerator cores. The ML-accelerator SoC also includes a network-on-chip (NoC) coupled to the set of ML-accelerator cores. The ML-accelerator SoC further includes an inference video post processing (infVPP) module coupled to the NoC. The ML-accelerator SoC also includes a video decoder coupled to the NoC.
-
公开(公告)号:US20240095872A1
公开(公告)日:2024-03-21
申请号:US17946753
申请日:2022-09-16
Applicant: QUALCOMM Incorporated
Inventor: Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN , Matthew SIMPSON , Geoffrey Carlton BERRY , Sandeep PANDE
IPC: G06F3/06
CPC classification number: G06F3/064 , G06F3/0604 , G06F3/0673
Abstract: A processor-implemented method for a memory storage format to accelerate machine learning (ML) on a computing device is described. The method includes receiving an image in a first layer storage format of a neural network. The method also includes assigning addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format. The method further includes storing the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels. The method also includes accelerating inference video processing of the image according to the assigned addresses for the image pixels corresponding to the blocked ML storage acceleration format.
-
公开(公告)号:US20230078079A1
公开(公告)日:2023-03-16
申请号:US17472412
申请日:2021-09-10
Applicant: QUALCOMM Incorporated
Inventor: Francois Ibrahim ATALLAH , Hoan Huu NGUYEN , Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN
IPC: G06N3/063 , G11C11/412
Abstract: A compute-in-memory array is provided that implements a filter for a layer in a neural network. The filter multiplies a plurality of activation bits by a plurality of filter weight bits for each channel in a plurality of channels through a charge accumulation from a plurality of capacitors. The accumulated charge is digitized to provide the output of the filter.
-
公开(公告)号:US20200073830A1
公开(公告)日:2020-03-05
申请号:US16556094
申请日:2019-08-29
Applicant: Qualcomm Incorporated
Inventor: Colin Beaton VERRILLI , Natarajan VAIDHYANATHAN , Rexford Alan HILL
Abstract: A method, apparatus, and system for an architecture for machine learning acceleration is presented. An apparatus includes a plurality of processing elements, each including a tightly-coupled memory, and a memory system coupled to the processing elements. A global synchronization manager is coupled to the plurality of the processing elements and to the memory system. The processing elements do not implement a coherency protocol with respect to the memory system. The processing elements implement direct memory access with respect to the memory system, and the global synchronization manager is configured to synchronize operations of the plurality of processing elements through the TCMs.
-
-
-
-
-
-
-
-
-