-
1.
公开(公告)号:US20230289212A1
公开(公告)日:2023-09-14
申请号:US17691808
申请日:2022-03-10
申请人: NVIDIA Corporation
发明人: Jerome F. DULUK, JR. , Gentaro HIROTA , Ronny KRASHINSKY , Greg PALMER , Jeff TUCKEY , Kaushik NADADHUR , Philip Browning JOHNSON , Praveen JOGINIPALLY
CPC分类号: G06F9/4856 , G06F9/461
摘要: Processing hardware of a processor is virtualized to provide a façade between a consistent programming interface and specific hardware instances. Hardware processor components can be permanently or temporarily disabled when not needed to support the consistent programming interface and/or to balance hardware processing across a hardware arrangement such as an integrated circuit. Executing software can be migrated from one hardware arrangement to another without need to reset the hardware.
-
公开(公告)号:US20230289211A1
公开(公告)日:2023-09-14
申请号:US17691872
申请日:2022-03-10
申请人: NVIDIA Corporation
发明人: Gentaro HIROTA , Tanmoy MANDAL , Jeff TUCKEY , Kevin STEPHANO , Chen MEI , Shayani DEB , Naman GOVIL , Rajballav DASH , Ronny KRASHINSKY , Ze LONG , Brian PHARRIS
CPC分类号: G06F9/4843 , G06F9/505
摘要: A processor supports new thread group hierarchies by centralizing work distribution to provide hardware-guaranteed concurrent execution of thread groups in a thread group array through speculative launch and load balancing across processing cores. Efficiencies are realized by distributing grid rasterization among the processing cores.
-
公开(公告)号:US20230289189A1
公开(公告)日:2023-09-14
申请号:US17691690
申请日:2022-03-10
申请人: NVIDIA Corporation
发明人: Prakash BANGALORE PRABHAKAR , Gentaro HIROTA , Ronny KRASHINSKY , Ze LONG , Brian PHARRIS , Rajballav DASH , Jeff TUCKEY , Jerome F. DULUK, JR. , Lacky SHAH , Luke DURANT , Jack CHOQUETTE , Eric WERNESS , Naman GOVIL , Manan PATEL , Shayani DEB , Sandeep NAVADA , John EDMONDSON , Greg PALMER , Wish GANDHI , Ravi MANYAM , Apoorv PARLE , Olivier GIROUX , Shirish GADRE , Steve HEINRICH
IPC分类号: G06F3/06
CPC分类号: G06F3/064 , G06F3/0604 , G06F3/0679
摘要: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.
-
公开(公告)号:US20200043123A1
公开(公告)日:2020-02-06
申请号:US16053341
申请日:2018-08-02
申请人: NVIDIA Corporation
发明人: Rajballav DASH , Gregory PALMER , Gentaro HIROTA , Lacky SHAH , Jack CHOQUETTE , Emmett KILGARIFF , Sriharsha NIVERTY , Milton LEI , Shirish GADRE , Omkar PARANJAPE , Lei YANG , Rouslan DIMITROV
摘要: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
-
公开(公告)号:US20230289215A1
公开(公告)日:2023-09-14
申请号:US17691621
申请日:2022-03-10
申请人: NVIDIA Corporation
发明人: Greg PALMER , Gentaro HIROTA , Ronny KRASHINSKY , Ze LONG , Brian PHARRIS , Rajballav DASH , Jeff TUCKEY , Jerome F. DULUK, JR. , Lacky SHAH , Luke DURANT , Jack CHOQUETTE , Eric WERNESS , Naman GOVIL , Manan PATEL , Shayani DEB , Sandeep NAVADA , John EDMONDSON , Prakash BANGALORE PRABHAKAR , Wish GANDHI , Ravi MANYAM , Apoorv PARLE , Olivier GIROUX , Shirish GADRE , Steve HEINRICH
CPC分类号: G06F9/4881 , G06F9/3851 , G06F9/3009 , G06F9/544
摘要: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.
-
公开(公告)号:US20230288471A1
公开(公告)日:2023-09-14
申请号:US17691759
申请日:2022-03-10
申请人: NVIDIA Corporation
发明人: Jerome F. DULUK , Gentaro HIROTA , Ronny KRASHINSKY , Greg PALMER , Jeff TUCKEY , Kaushik NADADHUR , Philip Browning JOHNSON , Praveen JOGINIPALLY
IPC分类号: G01R31/28
CPC分类号: G01R31/2884 , G01R31/2889 , G01R31/2896 , G01R31/2839
摘要: Processing hardware of a processor is virtualized to provide a façade between a consistent programming interface and specific hardware instances. Hardware processor components can be permanently or temporarily disabled when not needed to support the consistent programming interface and/or to balance hardware processing across a hardware arrangement such as an integrated circuit. Executing software can be migrated from one hardware arrangement to another without need to reset the hardware.
-
公开(公告)号:US20230236878A1
公开(公告)日:2023-07-27
申请号:US17583957
申请日:2022-01-25
申请人: NVIDIA CORPORATION
发明人: Jack Hilaire CHOQUETTE , Rajballav DASH , Shayani DEB , Gentaro HIROTA , Ronny M. KRASHINSKY , Ze LONG , Chen MEI , Manan PATEL , Ming Y. SIU
IPC分类号: G06F9/48
CPC分类号: G06F9/4881
摘要: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.
-
-
-
-
-
-