-
公开(公告)号:US12229215B2
公开(公告)日:2025-02-18
申请号:US18487918
申请日:2023-10-16
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Gang Zhong , Fei Wei , Yibin Zhang , Jing Han , Hongjiang Shang , Elina Kamenetskaya , Minjie Huang , Alexei Vladimirovich Bourd , Chun Yu , Andrew Evan Gruber , Eric Demers
Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
公开(公告)号:US11829439B2
公开(公告)日:2023-11-28
申请号:US17137226
申请日:2020-12-29
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Gang Zhong , Fei Wei , Yibin Zhang , Jing Han , Hongjiang Shang , Elina Kamenetskaya , Minjie Huang , Alexei Vladimirovich Bourd , Chun Yu , Andrew Evan Gruber , Eric Demers
Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
公开(公告)号:US11204765B1
公开(公告)日:2021-12-21
申请号:US17003600
申请日:2020-08-26
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Fei Wei , Gang Zhong , Minjie Huang , Jian Jiang , Zilin Ying , Baoguang Yang , Yang Xia , Jing Han , Liangxiao Hu , Chihong Zhang , Chun Yu , Andrew Evan Gruber , Eric Demers
Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
-
公开(公告)号:US11132760B2
公开(公告)日:2021-09-28
申请号:US16714052
申请日:2019-12-13
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Andrew Evan Gruber , Chihong Zhang , Gang Zhong , Jian Jiang , Fei Wei , Minjie Huang , Zilin Ying , Yang Xia , Jing Han , Chun Yu , Eric Demers
Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
-
公开(公告)号:US20210183005A1
公开(公告)日:2021-06-17
申请号:US16714052
申请日:2019-12-13
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Andrew Evan Gruber , Chihong Zhang , Gang Zhong , Jian Jiang , Fei Wei , Minjie Huang , Zilin Ying , Yang Xia , Jing Han , Chun Yu , Eric Demers
Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
-
-
-
-