HIERARCHICAL WORK SCHEDULING
    61.
    发明公开

    公开(公告)号:US20240111578A1

    公开(公告)日:2024-04-04

    申请号:US17957714

    申请日:2022-09-30

    CPC classification number: G06F9/4881

    Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.

    Precise suspend and resume of workloads in a processing unit

    公开(公告)号:US11609791B2

    公开(公告)日:2023-03-21

    申请号:US15828059

    申请日:2017-11-30

    Abstract: A first workload is executed in a first subset of pipelines of a processing unit. A second workload is executed in a second subset of the pipelines of the processing unit. The second workload is dependent upon the first workload. The first and second workloads are suspended and state information for the first and second workloads is stored in a first memory in response to suspending the first and second workloads. In some cases, a third workload executes in a third subset of the pipelines of the processing unit concurrently with executing the first and second workloads. In some cases, a fourth workload is executed in the first and second pipelines after suspending the first and second workloads. The first and second pipelines are resumed on the basis of the stored state information in response to completion or suspension of the fourth workload.

    DUAL VECTOR ARITHMETIC LOGIC UNIT
    64.
    发明申请

    公开(公告)号:US20220188076A1

    公开(公告)日:2022-06-16

    申请号:US17121354

    申请日:2020-12-14

    Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.

    Selective prefetching in multithreaded processing units

    公开(公告)号:US11226819B2

    公开(公告)日:2022-01-18

    申请号:US15818304

    申请日:2017-11-20

    Abstract: A processing unit includes a plurality of processing elements and one or more caches. A first thread executes a program that includes one or more prefetch instructions to prefetch information into a first cache. Prefetching is selectively enabled when executing the first thread on a first processing element dependent upon whether one or more second threads previously executed the program on the first processing element. The first thread is then dispatched to execute the program on the first processing element. In some cases, a dispatcher receives the first thread four dispatching to the first processing element. The dispatcher modifies the prefetch instruction to disable prefetching into the first cache in response to the one or more second threads having previously executed the program on the first processing element.

    Broadcast synchronization for dynamically adaptable arrays

    公开(公告)号:US11200060B1

    公开(公告)日:2021-12-14

    申请号:US17132002

    申请日:2020-12-23

    Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.

    PREFETCH KERNELS ON DATA-PARALLEL PROCESSORS
    69.
    发明申请

    公开(公告)号:US20200210341A1

    公开(公告)日:2020-07-02

    申请号:US16813075

    申请日:2020-03-09

    Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.

    Reconfigurable virtual graphics and compute processor pipeline

    公开(公告)号:US10664942B2

    公开(公告)日:2020-05-26

    申请号:US15331278

    申请日:2016-10-21

    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.

Patent Agency Ranking