Patent search ap:("QUALCOMM INCORPORATED") AND inv:"Yun Du" Page 1

1.

发明授权
Performing matrix multiplication in a streaming processor 有权

公开(公告)号：US12229215B2

公开(公告)日：2025-02-18

申请号：US18487918

申请日：2023-10-16

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Gang Zhong , Fei Wei , Yibin Zhang , Jing Han , Hongjiang Shang , Elina Kamenetskaya , Minjie Huang , Alexei Vladimirovich Bourd , Chun Yu , Andrew Evan Gruber , Eric Demers

IPC: G06F17/16 , G06F7/57 , G06F9/30 , G06F9/38

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

2.

发明授权
GPR optimization in a GPU based on a GPR release mechanism 有权

公开(公告)号：US11763419B2

公开(公告)日：2023-09-19

申请号：US18046901

申请日：2022-10-14

Applicant: QUALCOMM Incorporated

Inventor： Andrew Evan Gruber , Yun Du

IPC: G06T15/50 , G06T1/60 , G06F9/30 , G06T1/20

CPC classification number: G06T1/60 , G06F9/30098 , G06T1/20

Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

3.

发明授权
Methods and apparatus for constant data storage 有权

公开(公告)号：US11657471B2

公开(公告)日：2023-05-23

申请号：US17356434

申请日：2021-06-23

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Andrew Evan Gruber , Chihong Zhang , Jian Jiang , Gang Zhong , Baoguang Yang , Yang Xia , Chun Yu , Eric Demers

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60

Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may generate a table including a plurality of entries to store data associated with at least one of a constant value or an immediate value. The apparatus may also process, upon generating the table, first data including at least one of a constant value or an immediate value. Further, the apparatus may store, in the generated table, at least one of the constant value or the immediate value of the first data. The apparatus may also transmit, upon storing at least one of the constant value or the immediate value in the table, the table including the stored at least one of the constant value or the immediate value of the first data.

4.

发明授权
Methods and apparatus for wave slot management 有权

公开(公告)号：US11055808B2

公开(公告)日：2021-07-06

申请号：US16455641

申请日：2019-06-27

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Andrew Evan Gruber , Chun Yu , Zilin Ying

IPC: G06T1/20 , G06T1/60

Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, the apparatus can determine one or more context states of at least one context register in each of multiple wave slots. The apparatus can also send information corresponding to the one or more context states in one of the multiple wave slots to a context queue. Further, the apparatus can convert the information corresponding to the one or more context states to context information compatible with the context queue. The apparatus can also store the context information compatible with the context queue in the context queue. In some aspects, the apparatus can send the context information compatible with the context queue to one of the multiple wave slots. Additionally, the apparatus can convert the context information compatible with the context queue to the information corresponding to the one or more context states.

5.

发明申请
GENERAL PURPOSE REGISTER ALLOCATION IN STREAMING PROCESSOR 审中-公开

公开(公告)号：US20180165092A1

公开(公告)日：2018-06-14

申请号：US15379195

申请日：2016-12-14

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Liang Han , Lin Chen , Chihong Zhang , Hongjiang Shang , Jing Wu , Zilin Ying , Chun Yu , Guofang Jiao , Andrew Gruber , Eric Demers

IPC: G06F9/30 , G06F9/38

Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

6.

发明授权
Operand conflict resolution for reduced port general purpose register 有权

公开(公告)号：US09632783B2

公开(公告)日：2017-04-25

申请号：US14505854

申请日：2014-10-03

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Hongjiang Shang , Haikun Zhu

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30145 , G06F9/3012 , G06F9/30141 , G06F9/3017 , G06F9/30181 , G06F9/30189 , G06F9/383 , G06F9/3832 , G06F9/3885 , G06F9/3887

Abstract: Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.

7.

发明授权
Runtime mechanism to optimize shader execution flow 有权

公开(公告)号：US12229864B2

公开(公告)日：2025-02-18

申请号：US17817815

申请日：2022-08-05

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Eric Demers , Andrew Evan Gruber , Chun Yu , Baoguang Yang , Chihong Zhang , Yuehai Du , Avinash Seetharamaiah , Jonnala Gadda Nagendra Kumar , Gang Zhong , Zilin Ying , Fei Wei

IPC: G06T15/00 , G06T15/80

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

8.

发明授权
Run-time mechanism for optimal shader 有权

公开(公告)号：US12067666B2

公开(公告)日：2024-08-20

申请号：US17664033

申请日：2022-05-18

Applicant: QUALCOMM Incorporated

Inventor： Yun Du , Eric Demers , Andrew Evan Gruber , Chun Yu , Chihong Zhang , Baoguang Yang , Yuehai Du , Gang Zhong , Avinash Seetharamaiah , Jonnala Gadda Nagendra Kumar

IPC: G06T15/00 , G06T1/60

CPC classification number: G06T15/005 , G06T1/60

Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.

9.

发明授权
Fast incremental shared constants 有权

公开(公告)号：US11694384B2

公开(公告)日：2023-07-04

申请号：US17085272

申请日：2020-10-30

Applicant: QUALCOMM Incorporated

Inventor： Thomas Edwin Frisinger , Richard Hammerstone , Andrew Evan Gruber , Gang Zhong , Yun Du , Jonnala Gadda Nagendra Kumar

IPC: G06T15/00 , G06F9/30 , G06T1/20 , G06T1/60 , G06T15/80

CPC classification number: G06T15/005 , G06F9/30101 , G06F9/30123 , G06T1/20 , G06T1/60 , G06T15/80

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.

10.

发明授权
Per-shader preamble for graphics processing 有权

公开(公告)号：US09799089B1

公开(公告)日：2017-10-24

申请号：US15162272

申请日：2016-05-23

Applicant: QUALCOMM Incorporated

Inventor： Lin Chen , Yun Du , Andrew Evan Gruber , Guofang Jiao , Chun Yu , David Rigel Garcia Garcia

IPC: G06T1/20 , G06T15/80 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60 , G06T15/80

Abstract: A method for processing data in a graphics processing unit including receiving a code block of instructions common to a plurality of groups of threads of a shader, executing the code block of instructions common to the plurality of groups of threads of the shader creating a result by a first group of threads of the plurality of groups of threads, storing the result of the code block of instructions common to the plurality of groups of threads of the shader in on-chip random access memory (RAM), the on-chip RAM accessible by each of the plurality of groups of threads, and upon a determination that storing the result of the code block of instructions common to the plurality of groups of threads of the shader has completed, returning the result of the code block of instructions common to the plurality of groups of threads of the shader from on-chip RAM.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification