Patent search ap:("QUALCOMM INCORPORATED") AND inv:"Colin Beaton Verrilli" Page 3

21.

发明授权
Inline decompression 有权

公开(公告)号：US11362672B2

公开(公告)日：2022-06-14

申请号：US16870873

申请日：2020-05-08

Applicant: QUALCOMM Incorporated

Inventor： Colin Beaton Verrilli , Natarajan Vaidhyanathan

IPC: H03M7/30 , G06F16/22 , G06N3/08

Abstract: Stack compression refers to compression of data in one or more dimensions. For uncompressed data blocks that are very sparse, i.e., data blocks that contain many zeros, stack compression can be effective. In stack compression, uncompressed data block is compressed into compressed data block by removing one or more zero words from the uncompressed data block. A map metadata that maps the zero words of the uncompressed data block is generated during compression. With the use of the map metadata, the compressed data block can be decompressed to restore the uncompressed data block.

22.

发明申请
PROVIDING EFFICIENT FLOATING-POINT OPERATIONS USING MATRIX PROCESSORS IN PROCESSOR-BASED SYSTEMS 审中-公开

公开(公告)号：US20190065146A1

公开(公告)日：2019-02-28

申请号：US16118099

申请日：2018-08-30

Applicant: QUALCOMM Incorporated

Inventor： Mattheus Cornelis Antonius Adrianus Heddes , Natarajan Vaidhyanathan , Robert Dreyer , Colin Beaton Verrilli , Koustav Bhattacharya

IPC: G06F7/483 , G06F15/80

Abstract: Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator. After processing all pairs of floating-point operands, the matrix processor subtracts the negative partial sum accumulator from the positive partial sum accumulator to generate a final sum, then renormalizes the final sum a single time.

23.

发明申请
PROVIDING MEMORY BANDWIDTH COMPRESSION USING BACK-TO-BACK READ OPERATIONS BY COMPRESSED MEMORY CONTROLLERS (CMCs) IN A CENTRAL PROCESSING UNIT (CPU)-BASED SYSTEM 审中-公开
Title translation: 使用中央处理单元（CPU）系统中的压缩存储器控制器（CMC）进行背面读取操作提供存储带宽压缩

公开(公告)号：US20160224241A1

公开(公告)日：2016-08-04

申请号：US14844516

申请日：2015-09-03

Applicant: QUALCOMM Incorporated

Inventor： Colin Beaton Verrilli , Mattheus Cornelis Antonius Adrianus Heddes , Brian Joel Schuh , Michael Raymond Trombley , Natarajan Vaidhyanathan

IPC: G06F3/06

CPC classification number: G06F3/061 , G06F3/0659 , G06F3/0661 , G06F3/0679 , G06F11/1004 , G06F11/1048 , G06F12/023 , G06F12/08 , G06F12/0811 , G06F12/084 , G06F12/0862 , G06F2212/1024 , G06F2212/1044 , G06F2212/401

Abstract: Providing memory bandwidth compression using back-to-back read operations by compressed memory controllers (CMCs) in a central processing unit (CPU)-based system is disclosed. In this regard, in some aspects, a CMC is configured to receive a memory read request to a physical address in a system memory, and read a compression indicator (CI) for the physical address from error correcting code (ECC) bits of a first memory block in a memory line associated with the physical address. Based on the CI, the CMC determines whether the first memory block comprises compressed data. If not, the CMC performs a back-to-back read of one or more additional memory blocks of the memory line in parallel with returning the first memory block. Some aspects may further improve memory access latency by writing compressed data to each of a plurality of memory blocks of the memory line, rather than only to the first memory block.

Abstract translation: 公开了使用基于中央处理单元（CPU）的系统中的压缩存储器控制器（CMC）的背靠背读取操作来提供存储器带宽压缩。在这方面，在一些方面，CMC被配置为接收对系统存储器中的物理地址的存储器读取请求，并且从第一个的第一个的错误校正码（ECC）位读取物理地址的压缩指示符（CI）与物理地址相关联的内存条中的内存块。基于CI，CMC确定第一存储块是否包含压缩数据。如果不是，则CMC返回第一个存储器块，并行执行对存储器线的一个或多个附加存储器块的背对背读取。一些方面可以通过将压缩数据写入到存储器线的多个存储块中的每一个而不是仅对第一存储器块来进一步改善存储器访问等待时间。

24.

发明申请
MEMORY CONTROLLERS EMPLOYING MEMORY CAPACITY AND/OR BANDWIDTH COMPRESSION WITH NEXT READ ADDRESS PREFETCHING, AND RELATED PROCESSOR-BASED SYSTEMS AND METHODS 有权
Title translation: 使用下一个读取地址前缀的存储器容量和/或带宽压缩的存储器控制器和相关的基于处理器的系统和方法

公开(公告)号：US20150339237A1

公开(公告)日：2015-11-26

申请号：US14716108

申请日：2015-05-19

Applicant: QUALCOMM Incorporated

Inventor： Mattheus Cornelis Antonius Adrianus Heddes , Natarajan Vaidhyanathan , Colin Beaton Verrilli

IPC: G06F12/08

CPC classification number: G06F12/0875 , G06F12/0246 , G06F12/0862 , G06F12/1009 , G06F2212/1016 , G06F2212/1056 , G06F2212/251 , G06F2212/401 , G06F2212/45 , G06F2212/602 , H03M7/30 , Y02D10/13

Abstract: Memory controllers employing memory capacity and/or bandwidth compression with next read address prefetching, and related processor-based systems and methods are disclosed. In certain aspects, memory controllers are employed that can provide memory capacity compression. In certain aspects disclosed herein, a next read address prefetching scheme can be used by a memory controller to speculatively prefetch data from system memory at another address beyond the currently accessed address. Thus, when memory data is addressed in the compressed memory, if the next read address is stored in metadata associated with the memory block at the accessed address, the memory data at the next read address can be prefetched by the memory controller to be available in case a subsequent read operation issued by a central processing unit (CPU) has been prefetched by the memory controller.

Abstract translation: 公开了采用存储器容量和/或带有下一个读取地址预取的带宽压缩的存储器控制器以及相关的基于处理器的系统和方法。在某些方面，采用可提供存储容量压缩的存储器控制器。在本文公开的某些方面，存储器控制器可以使用下一个读取地址预取方案来在超出当前访问的地址的另一地址上推测性地从系统存储器预取数据。因此，当在压缩存储器中寻址存储器数据时，如果下一个读取地址存储在与访问地址处的存储器块相关联的元数据中，则下一个读取地址的存储器数据可以被存储器控制器预取为可用于已经由存储器控制器预取了由中央处理单元（CPU）发出的后续读取操作的情况。

25.

发明申请
MEMORY CONTROLLERS EMPLOYING MEMORY CAPACITY COMPRESSION, AND RELATED PROCESSOR-BASED SYSTEMS AND METHODS 审中-公开
Title translation: 使用内存容量压缩的内存控制器和基于相关处理器的系统和方法

公开(公告)号：US20150339228A1

公开(公告)日：2015-11-26

申请号：US14716001

申请日：2015-05-19

Applicant: QUALCOMM Incorporated

Inventor： Mattheus Cornelis Antonius Adrianus Heddes , Natarajan Vaidhyanathan , Colin Beaton Verrilli

IPC: G06F12/08

CPC classification number: G06F12/0802 , G06F12/023 , G06F2212/1016 , G06F2212/1056 , G06F2212/251 , G06F2212/305 , G06F2212/401 , G06F2212/608 , Y02D10/13

Abstract: Aspects disclosed herein include memory controllers employing memory capacity compression, and related processor-based systems and methods. In certain aspects, compressed memory controllers are employed that can provide memory capacity compression. In some aspects, a line-based memory capacity compression scheme can be employed where additional translation of a physical address (PA) to a physical buffer address is performed to allow compressed data in a system memory at the physical buffer address for efficient compressed data storage. A translation lookaside buffer (TLB) may also be employed to store TLB entries comprising PA tags corresponding to a physical buffer address in the system memory to more efficiently perform the translation of the PA to the physical buffer address in the system memory. In certain aspects, a line-based memory capacity compression scheme, a page-based memory capacity compression scheme, or a hybrid line-page-based memory capacity compression scheme can be employed.

Abstract translation: 本文公开的方面包括采用存储容量压缩的存储器控制器以及相关的基于处理器的系统和方法。在某些方面，采用可以提供存储器容量压缩的压缩存储器控制器。在一些方面，可以采用基于行的存储器容量压缩方案，其中执行物理地址（PA）到物理缓冲器地址的附加转换以允许在物理缓冲器地址处的系统存储器中的压缩数据用于有效的压缩数据存储。还可以使用翻译后备缓冲器（TLB）来存储包括与系统存储器中的物理缓冲器地址相对应的PA标签的TLB条目，以更有效地执行PA到系统存储器中的物理缓冲器地址的转换。在某些方面，可以采用基于行的存储器容量压缩方案，基于页面的存储器容量压缩方案或基于混合行页面的存储器容量压缩方案。

26.

发明授权
Memory storage format for supporting machine learning acceleration 有权

公开(公告)号：US12165237B2

公开(公告)日：2024-12-10

申请号：US17946753

申请日：2022-09-16

Applicant: QUALCOMM Incorporated

Inventor： Colin Beaton Verrilli , Natarajan Vaidhyanathan , Matthew Simpson , Geoffrey Carlton Berry , Sandeep Pande

IPC: G06T1/60 , G06N3/063

Abstract: A processor-implemented method for a memory storage format to accelerate machine learning (ML) on a computing device is described. The method includes receiving an image in a first layer storage format of a neural network. The method also includes assigning addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format. The method further includes storing the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels. The method also includes accelerating inference video processing of the image according to the assigned addresses for the image pixels corresponding to the blocked ML storage acceleration format.

27.

发明授权
Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems 有权

公开(公告)号：US11144368B2

公开(公告)日：2021-10-12

申请号：US16443954

申请日：2019-06-18

Applicant: QUALCOMM Incorporated

Inventor： Colin Beaton Verrilli , Natarajan Vaidhyanathan

IPC: G06F9/46 , G06F9/52

Abstract: Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems is disclosed. In one aspect, a synchronization management circuit provides a semaphore including a counting semaphore value indicator, a current wait count indicator, and a target wait count indicator. When a consumer completes a wait operation, the synchronization management circuit adjusts the value of the current wait count indicator towards the value of the target wait count indicator, and compares the value of the current wait count indicator to the value of the target wait count indicator. If the value of the current wait count indicator has reached the value of the target wait count indicator, the synchronization management circuit infers that all consumers have observed the semaphore, and accordingly resets both the counting semaphore value indicator and the current wait count indicator to an initial wait value to place the semaphore in its initial state for reuse.

28.

发明授权
Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices 有权

公开(公告)号：US10936943B2

公开(公告)日：2021-03-02

申请号：US16117952

申请日：2018-08-30

Applicant: QUALCOMM Incorporated

Inventor： Colin Beaton Verrilli , Mattheus Cornelis Antonius Adrianus Heddes , Natarajan Vaidhyanathan , Koustav Bhattacharya , Robert Dreyer

IPC: G06N3/063 , G06F15/80 , G06F17/16 , G06N3/04

Abstract: Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices is disclosed. In this regard, a matrix-processor-based device provides a central processing unit (CPU) and a matrix processor. The matrix processor reorganizes a plurality of weight matrices and a plurality of input matrices into swizzled weight matrices and swizzled input matrices, respectively, that have regular dimensions natively supported by the matrix processor. The matrix-processor-based device then performs a convolution operation using the matrix processor to perform matrix multiplication/accumulation operations for the regular dimensions of the weight matrices and the input matrices, and further uses the CPU to execute instructions for handling the irregular dimensions of the weight matrices and the input matrices (e.g., by executing a series of nested loops, as a non-limiting example). The matrix-processor-based device thus provides efficient hardware acceleration by taking advantage of dimensional regularity, while maintaining the flexibility to handle different variations of convolution.

29.

发明授权
Providing efficient floating-point operations using matrix processors in processor-based systems 有权

公开(公告)号：US10747501B2

公开(公告)日：2020-08-18

申请号：US16118099

申请日：2018-08-30

Applicant: QUALCOMM Incorporated

Inventor： Mattheus Cornelis Antonius Adrianus Heddes , Natarajan Vaidhyanathan , Robert Dreyer , Colin Beaton Verrilli , Koustav Bhattacharya

IPC: G06F7/483 , G06F7/544 , G06F15/80 , G06F7/499 , G06F15/78

Abstract: Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator. After processing all pairs of floating-point operands, the matrix processor subtracts the negative partial sum accumulator from the positive partial sum accumulator to generate a final sum, then renormalizes the final sum a single time.

30.

发明申请
PROVIDING MATRIX MULTIPLICATION USING VECTOR REGISTERS IN PROCESSOR-BASED DEVICES 审中-公开

公开(公告)号：US20190079903A1

公开(公告)日：2019-03-14

申请号：US16129480

申请日：2018-09-12

Applicant: QUALCOMM Incorporated

Inventor： Robert Dreyer , Mattheus Cornelis Antonius Adrianus Heddes , Colin Beaton Verrilli , Natarajan Vaidhyanathan , Koustav Bhattacharya

IPC: G06F17/16 , G06F15/80

Abstract: Providing matrix multiplication using vector registers in processor-based devices is disclosed. In one aspect, a method for providing matrix multiplication comprises rearranging elements of a first submatrix and a second submatrix into first and second vectors, respectively, which are stored in first and second vector registers. A matrix multiplication vector operation using the first and second vector registers as input operands is then performed to generate an output vector that is stored in an output vector register. Each element E of the output vector, where 0≤E

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification