-
公开(公告)号:US09183014B2
公开(公告)日:2015-11-10
申请号:US13028574
申请日:2011-02-16
CPC分类号: G06F9/547 , G06F9/30061 , G06F9/449 , G06F9/455 , G06F15/8007
摘要: Systems and methods of enabling virtual calls in a single instruction multiple data (SIMD) environment may involve detecting a virtual call of a function and using a single dispatch of the function to invoke the virtual call for two or more channels of the virtual call. In one example, it is determined that the two or more channels share a common target address and a single dispatch of the function is conducted with respect to the common target address. The process may be iterated for additional channels of the virtual call that share a common target address.
摘要翻译: 在单个指令多数据(SIMD)环境中启用虚拟呼叫的系统和方法可以涉及检测功能的虚拟呼叫,并且使用该功能的单个调度来调用虚拟呼叫的两个或多个信道的虚拟呼叫。 在一个示例中,确定两个或更多个信道共享公共目标地址,并且相对于公共目标地址进行该功能的单个调度。 可以对共享共同目标地址的虚拟呼叫的附加信道重复该过程。
-
公开(公告)号:US20120210098A1
公开(公告)日:2012-08-16
申请号:US13028574
申请日:2011-02-16
IPC分类号: G06F9/38
CPC分类号: G06F9/547 , G06F9/30061 , G06F9/449 , G06F9/455 , G06F15/8007
摘要: Systems and methods of enabling virtual calls in a single instruction multiple data (SIMD) environment may involve detecting a virtual call of a function and using a single dispatch of the function to invoke the virtual call for two or more channels of the virtual call. In one example, it is determined that the two or more channels share a common target address and a single dispatch of the function is conducted with respect to the common target address. The process may be iterated for additional channels of the virtual call that share a common target address.
摘要翻译: 在单个指令多数据(SIMD)环境中启用虚拟呼叫的系统和方法可以涉及检测功能的虚拟呼叫,并且使用该功能的单个调度来调用虚拟呼叫的两个或多个信道的虚拟呼叫。 在一个示例中,确定两个或更多个信道共享公共目标地址,并且相对于公共目标地址进行该功能的单个调度。 可以对共享共同目标地址的虚拟呼叫的附加信道重复该过程。
-
公开(公告)号:US20180285106A1
公开(公告)日:2018-10-04
申请号:US15477033
申请日:2017-04-01
申请人: Abhishek R. Appu , Altug Koker , Joydeep Ray , Kamal Sinha , Kiran C. Veernapu , Subramaniam Maiyuran , Prasoonkumar Surti , Guei-Yuan Lueh , David Puffer , Supratim Pal , Eric J. Hoekstra , Travis T. Schluessler , Linda L. Hurd
发明人: Abhishek R. Appu , Altug Koker , Joydeep Ray , Kamal Sinha , Kiran C. Veernapu , Subramaniam Maiyuran , Prasoonkumar Surti , Guei-Yuan Lueh , David Puffer , Supratim Pal , Eric J. Hoekstra , Travis T. Schluessler , Linda L. Hurd
摘要: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180095785A1
公开(公告)日:2018-04-05
申请号:US15281260
申请日:2016-09-30
申请人: Altug Koker , Prassonkumar Surti , Guei-Yuan Lueh , Subramaniam Maiyuran , Tomas G. Akenine-Moller , David J. Cowperthwaite , Balaji Vembu
发明人: Altug Koker , Prassonkumar Surti , Guei-Yuan Lueh , Subramaniam Maiyuran , Tomas G. Akenine-Moller , David J. Cowperthwaite , Balaji Vembu
IPC分类号: G06F9/48
CPC分类号: G06F9/4831 , G06F9/4881
摘要: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a thread dispatcher to assign a priority class to each of a plurality of processing threads prior to dispatching the one or more processing threads, a plurality of execution units to process the threads, a shared resource coupled to each of the plurality of execution units and an arbitration unit to grant access to the shared resource to a first of the plurality of execution units based on the priority class of a thread being executed at the first execution unit.
-
公开(公告)号:US10649956B2
公开(公告)日:2020-05-12
申请号:US15477027
申请日:2017-04-01
申请人: Altug Koker , Prasoonkumar Surti , David Puffer , Subramaniam Maiyuran , Guei-Yuan Lueh , Abhishek R. Appu , Joydeep Ray , Balaji Vembu , Tomer Bar-On , Andrew T. Lauritzen , Hugues Labbe , John G. Gierach , Gabor Liktor
发明人: Altug Koker , Prasoonkumar Surti , David Puffer , Subramaniam Maiyuran , Guei-Yuan Lueh , Abhishek R. Appu , Joydeep Ray , Balaji Vembu , Tomer Bar-On , Andrew T. Lauritzen , Hugues Labbe , John G. Gierach , Gabor Liktor
IPC分类号: G06F16/13 , G06F9/38 , G06F9/30 , G06F16/11 , G06F16/172 , G06F9/46 , G06F12/1036 , G06F12/1045 , G06F12/0831
摘要: In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180096446A1
公开(公告)日:2018-04-05
申请号:US15281276
申请日:2016-09-30
申请人: Kaiyu Chen , Guei-Yuan Lueh , Subramaniam Maiyuran
发明人: Kaiyu Chen , Guei-Yuan Lueh , Subramaniam Maiyuran
摘要: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate dynamic renaming of the plurality of registers by logically partitioning the plurality of registers in the register file into a set of fixed registers and a set of shared registers.
-
公开(公告)号:US20170178384A1
公开(公告)日:2017-06-22
申请号:US14976122
申请日:2015-12-21
摘要: Reducing SIMD fragmentation for SIMD execution widths of 32 or even 64 channels in a single hardware thread leads to better EU utilization. Increasing SIMD execution widths to 32 or 64 channels per thread, enables handling more vertices, patches, primitives and triangles per EU hardware thread. Modified 3D pipeline shader payloads can handle multiple patches in case of domain shaders or multiple primitives when primitive object instance count is greater than one in the case of geometry shaders and multiple triangles in case of pixel shaders.
-
公开(公告)号:US20170178274A1
公开(公告)日:2017-06-22
申请号:US14976306
申请日:2015-12-21
CPC分类号: G06F9/46 , G06F8/41 , G06F9/50 , G06F12/0842 , G06F2209/507 , G06T1/60 , G06T15/005 , G06T17/20
摘要: To use SIMD lanes efficiently for domain shader execution, domain point data from different domain shader patches may be packed together into a single SIMD thread. To generate an efficient code sequence, each domain point occupies one SIMD lane and all attributes for the domain point reside in their own partition of General Register File (GRF) space. This technique is called the multiple-patch SIMD dispatch mode.
-
公开(公告)号:US20180285374A1
公开(公告)日:2018-10-04
申请号:US15477027
申请日:2017-04-01
申请人: Altug Koker , Prasoonkumar Surti , David Puffer , Subramaniam Maiyuran , Guei-Yuan Lueh , Abhishek R. Appu , Joydeep Ray , Balaji Vembu , Tomer Bar-On , Andrew T. Lauritzen , Hugues Labbe , John G. Gierach , Gabor Liktor
发明人: Altug Koker , Prasoonkumar Surti , David Puffer , Subramaniam Maiyuran , Guei-Yuan Lueh , Abhishek R. Appu , Joydeep Ray , Balaji Vembu , Tomer Bar-On , Andrew T. Lauritzen , Hugues Labbe , John G. Gierach , Gabor Liktor
摘要: In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20180285158A1
公开(公告)日:2018-10-04
申请号:US15477026
申请日:2017-04-01
申请人: Abhishek R. Appu , Altug Koker , Balaji Vembu , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Kiran C. Veernapu , Subramaniam Maiyuran , Sanjeev S. Jahagirdar , Eric J. Asperheim , Guei-Yuan Lueh , David Puffer , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Josh B. Mastronarde , Linda L. Hurd , Travis T. Schluessler , Tomasz Janczak , Abhishek Venkatesh , Kai Xiao , Slawomir Grajewski
发明人: Abhishek R. Appu , Altug Koker , Balaji Vembu , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Kiran C. Veernapu , Subramaniam Maiyuran , Sanjeev S. Jahagirdar , Eric J. Asperheim , Guei-Yuan Lueh , David Puffer , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Josh B. Mastronarde , Linda L. Hurd , Travis T. Schluessler , Tomasz Janczak , Abhishek Venkatesh , Kai Xiao , Slawomir Grajewski
摘要: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-