-
公开(公告)号:US10558460B2
公开(公告)日:2020-02-11
申请号:US15379195
申请日:2016-12-14
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Liang Han , Lin Chen , Chihong Zhang , Hongjiang Shang , Jing Wu , Zilin Ying , Chun Yu , Guofang Jiao , Andrew Gruber , Eric Demers
Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
-
公开(公告)号:US10223436B2
公开(公告)日:2019-03-05
申请号:US15258207
申请日:2016-09-07
Applicant: QUALCOMM Incorporated
Inventor: Alexei Vladimirovich Bourd , Vladislav Shimanskiy , Maxim Kazakov , Yun Du
Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.
-
公开(公告)号:US09747104B2
公开(公告)日:2017-08-29
申请号:US14275047
申请日:2014-05-12
Applicant: QUALCOMM Incorporated
Inventor: Lin Chen , Yun Du , Sumesh Udayakumaran , Chihong Zhang , Andrew Evan Gruber
CPC classification number: G06F9/3012 , G06F9/30032 , G06F9/3017 , G06F9/3869 , G06F9/3875
Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.
-
公开(公告)号:US09633411B2
公开(公告)日:2017-04-25
申请号:US14316391
申请日:2014-06-26
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Andrew Evan Gruber , Lin Chen , Guofang Jiao , Chun Yu
CPC classification number: G06T1/60 , G06T15/80 , G09G5/363 , G09G2352/00 , G09G2360/06
Abstract: Techniques are described for determining whether data of a variable for each of a plurality of graphics items is same. If determined that the data is the same, the techniques store the data in a storage location of a specialized shared general purpose register that is associated with the variable.
-
公开(公告)号:US20160054998A1
公开(公告)日:2016-02-25
申请号:US14462932
申请日:2014-08-19
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Lin Chen , Andrew Evan Gruber , Chihong Zhang , Chun Yu
CPC classification number: G06F9/30098 , G06F8/441 , G06F9/30145 , G06F9/30181 , G06F9/3828 , G06F9/3859 , G06T1/20 , G06T2200/28
Abstract: Techniques are described in which an indication is included to indicate a last use of an intermediate value generated as part of determining a final value is not be stored in a general purpose register (GPR). A processing unit avoids storing the intermediate value in the GPR based on the indication because the intermediate value is no longer needed for determining the final value.
Abstract translation: 描述了其中包括指示以指示作为确定最终值的一部分而生成的中间值的最后使用的指示不被存储在通用寄存器(GPR)中的技术。 处理单元基于指示,避免将中间值存储在GPR中,因为不再需要中间值来确定最终值。
-
公开(公告)号:US12056790B2
公开(公告)日:2024-08-06
申请号:US17758219
申请日:2020-01-31
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Andrew Evan Gruber , Chun Yu , Chihong Zhang , Thomas Edwin Frisinger , Richard Hammerstone , Zilin Ying , Heng Qi , Quanquan Xu , Sheng Gu
IPC: G06T1/60
CPC classification number: G06T1/60
Abstract: The present disclosure relates to methods and apparatus for graphics processing. For example, disclosed techniques facilitate improving bindless state processing at a graphics processor. Aspects of the present disclosure can receive, at a graphics processor, a shader program including a preamble section and a main instructions section. Aspects of the present disclosure can also execute, with a scalar processor dedicated to processing preamble sections, instructions of the preamble section to implement a bindless mechanism for loading constant data associated with the shader program. Additionally, aspects of the present disclosure can distribute the main instructions section and the constant data to a streaming processor for executing the shader program.
-
公开(公告)号:US11954758B2
公开(公告)日:2024-04-09
申请号:US17652478
申请日:2022-02-24
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Andrew Evan Gruber , Zilin Ying , Chunling Hu , Baoguang Yang , Yang Xia , Gang Zhong , Chun Yu , Eric Demers
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
-
公开(公告)号:US11829439B2
公开(公告)日:2023-11-28
申请号:US17137226
申请日:2020-12-29
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Gang Zhong , Fei Wei , Yibin Zhang , Jing Han , Hongjiang Shang , Elina Kamenetskaya , Minjie Huang , Alexei Vladimirovich Bourd , Chun Yu , Andrew Evan Gruber , Eric Demers
Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
公开(公告)号:US11204765B1
公开(公告)日:2021-12-21
申请号:US17003600
申请日:2020-08-26
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Fei Wei , Gang Zhong , Minjie Huang , Jian Jiang , Zilin Ying , Baoguang Yang , Yang Xia , Jing Han , Liangxiao Hu , Chihong Zhang , Chun Yu , Andrew Evan Gruber , Eric Demers
Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
-
公开(公告)号:US11132760B2
公开(公告)日:2021-09-28
申请号:US16714052
申请日:2019-12-13
Applicant: QUALCOMM Incorporated
Inventor: Yun Du , Andrew Evan Gruber , Chihong Zhang , Gang Zhong , Jian Jiang , Fei Wei , Minjie Huang , Zilin Ying , Yang Xia , Jing Han , Chun Yu , Eric Demers
Abstract: Methods, systems, and devices for graphic processing are described. The methods, systems, and devices may include or be associated with identifying a graphics instruction, determining that the graphics instruction is alias enabled for the device, partitioning an alias lookup table into one or more slots, allocating a slot of the alias lookup table based on the partitioning and determining that the graphics instruction is alias enabled, generating an alias instruction based on allocating the slot of the alias lookup table and determining that the graphics instruction is alias enabled, and processing the alias instruction.
-
-
-
-
-
-
-
-
-