-
公开(公告)号:US20230367633A1
公开(公告)日:2023-11-16
申请号:US18055169
申请日:2022-11-14
CPC分类号: G06F9/4881 , G06T1/20
摘要: A GPU and a GPU method are provided. The GPU includes stream multiprocessors, an available hardware resource table, resource comparator, stream scheduler, and global dispatcher. Each stream multiprocessor executes at least one thread block. The available hardware resource table records hardware resources available to the stream multiprocessors. The resource comparator selects from first-priority kernel codes in the kernel streams at least one first dispatchable kernel code whose required hardware resources are less than hardware resources available to the stream multiprocessors according to the available hardware resource table. The stream scheduler selects from the at least one first dispatchable kernel code a kernel code as a selected kernel code. The global dispatcher dispatches thread blocks in the selected kernel code to the stream multiprocessors for execution and updates the available hardware resource table according to usage conditions of hardware resources of the stream multiprocessors.
-
公开(公告)号:US20230394617A1
公开(公告)日:2023-12-07
申请号:US18046097
申请日:2022-10-12
发明人: YUAN GAO , FEI SUN , HAORAN LI , GUYUE HUANG , CHEN ZHANG , RUIGUANG ZHONG
摘要: The present application discloses a warp execution method used for SPs of an SM of a GPU and an associated GPU. The SPs share a scratchpad memory, and the warp execution method includes: when the predetermined time point for warp-loading is reached, checking a first indicator to obtain a size of a space with the status of blank in the scratchpad memory, to determining whether to load the warp, wherein the first indicator is used to indicate a starting position of a space with the status of data-in-use and an ending position of the space with the status of blank; and when the predetermined time point for computing is reached, checking a second indicator and a third indicator to obtain a size of a space with the status of data-not-in-use in the scratchpad memory, to determining whether to compute the warp
-
公开(公告)号:US20230367741A1
公开(公告)日:2023-11-16
申请号:US17937973
申请日:2022-10-04
发明人: HAORAN LI , FEI SUN , YUAN GAO , GUYUE HUANG , RUIGUANG ZHONG , CHEN ZHANG
CPC分类号: G06F15/82 , G06F9/3013
摘要: The present application discloses a GPU and a method of the same. The GPU includes: a plurality of streaming multiprocessor (SMs), each including: a plurality of streaming processors (SPs), each including a register, wherein each SP has a predetermined upper bound of warp number, and the register has a predetermined upper bound of register capacity; and a global dispatcher, including: a register occupancy status table, for recording the warp number and an occupancy status of the register of each SP of each SM; a TB (TB) dispatch module, for dispatching the TB to a first SM of the SMs according to a warp type classification table and the register occupancy status table; and a warp dispatch module, for dispatching a plurality of warps to the plurality of SPs of the first SM according to the warp type classification table and the register occupancy status table.
-
公开(公告)号:US20230367630A1
公开(公告)日:2023-11-16
申请号:US18055205
申请日:2022-11-14
CPC分类号: G06F9/4875 , G06F9/52 , G06F1/20
摘要: A stream multiprocessor, a GPU, and related methods are provided. The stream multiprocessor executes thread blocks. Each thread block includes warps. The stream multiprocessor includes stream processors and a local dispatcher. Each stream processor executes one or more warps. The local dispatcher includes a warp state table, a warp resource detection unit and a warp launching unit. The warp state table records dispatching states and processing states of warps of the thread blocks. The warp resource detection unit selects all the first warps of a first thread block and at least one second warp of a second thread block according to hardware resources available to the stream multiprocessor and hardware resources required for thread blocks. The warp launching unit dispatches the first warps to idle stream processors and at least one second warp to at least one idle stream processor.
-
-
-