-
公开(公告)号:US20240037183A1
公开(公告)日:2024-02-01
申请号:US18487918
申请日:2023-10-16
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Gang ZHONG , Fei WEI , Yibin ZHANG , Jing HAN , Hongjiang SHANG , Elina KAMENETSKAYA , Minjie HUANG , Alexei Vladimirovich BOURD , Chun YU , Andrew Evan GRUBER , Eric DEMERS
Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
公开(公告)号:US20230252717A1
公开(公告)日:2023-08-10
申请号:US17665341
申请日:2022-02-04
Applicant: QUALCOMM Incorporated
Inventor: David Kirk MCALLISTER , Fei WEI , Alexei Vladimirovich BOURD
CPC classification number: G06T15/06 , G06T15/005
Abstract: Systems and techniques are provided for enhancing operations of a ray tracing processor. For instance, a process can include obtaining one or more nodes of an acceleration data structure. Each node of the one or more nodes includes the same number of bytes. The node(s) can be stored in a cache associated with a ray tracing processor. Each of the stored node(s) are cache line-aligned with the cache associated with the ray tracing processor. A first stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a first clock cycle of the ray tracing processor. A second stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a second clock cycle of the ray tracing processor.
-
公开(公告)号:US20240046543A1
公开(公告)日:2024-02-08
申请号:US17817815
申请日:2022-08-05
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Eric DEMERS , Andrew Evan GRUBER , Chun YU , Baoguang YANG , Chihong ZHANG , Yuehai DU , Avinash SEETHARAMAIAH , Jonnala Gadda NAGENDRA KUMAR , Gang ZHONG , Zilin YING , Fei WEI
CPC classification number: G06T15/005 , G06T15/80
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.
-
公开(公告)号:US20210200836A1
公开(公告)日:2021-07-01
申请号:US17137226
申请日:2020-12-29
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Gang ZHONG , Fei WEI , Yibin ZHANG , Jing HAN , Hongjiang SHANG , Elina KAMENETSKAYA , Minjie HUANG , Alexei Vladimirovich BOURD , Chun YU , Andrew Evan GRUBER , Eric DEMERS
Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
公开(公告)号:US20200312006A1
公开(公告)日:2020-10-01
申请号:US16364829
申请日:2019-03-26
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Andrew Evan GRUBER , Chun YU , Chihong ZHANG , Hongjiang SHANG , Zilin YING , Fei WEI
Abstract: Example techniques are described for generating graphics content by obtaining texture operation instructions corresponding to a texture operation, in response to determining at least one of insufficient general purpose register space is available for the texture operation or insufficient wave slots are available for the texture operation, generating an indication that the texture operation corresponds to a deferred wave, executing the texture operation, sending, to a texture processor, initial texture sample instructions corresponding to the texture operation that was executed, and receiving texture mapped data corresponding to the initial texture sample instructions.
-
-
-
-