Platform agnostic atomic operations

    公开(公告)号:US11461045B2

    公开(公告)日:2022-10-04

    申请号:US16370035

    申请日:2019-03-29

    Inventor: Mark Fowler

    Abstract: A processing unit is configured to access a first memory that supports atomic operations and a second memory via an interface. The second memory or the interface does not support atomicity of the atomic operations. A trap handler is configured to trap atomic operations and enforce atomicity of the trapped atomic operations. The processing unit selectively provides atomic operations to the trap handler in response to detecting that memory access requests in the atomic operations are directed to the second memory via the interface. In some cases, the processing unit detects a frequency of traps that result from atomic operations that include memory access requests to a page stored in the second memory. The processing unit transfers the page from the second memory to the first memory in response to the trap frequency exceeding a threshold.

    Memory controller with flexible address decoding

    公开(公告)号:US10403333B2

    公开(公告)日:2019-09-03

    申请号:US15211887

    申请日:2016-07-15

    Abstract: A memory controller includes a host interface for receiving memory access requests including access addresses, a memory interface for providing memory accesses to a memory system, and an address decoder coupled to the host interface for programmably mapping the access addresses to selected ones of a plurality of regions. The address decoder is programmable to map the access addresses to a first region having a non-power-of-two size using a primary decoder and a secondary decoder each having power-of-two sizes, and providing a first region mapping signal in response. A command queue stores the memory access requests and region mapping signals. An arbiter picks the memory access requests from the command queue based on a plurality of criteria, which are evaluated based in part on the region mapping signals, and provides corresponding memory accesses to the memory interface in response.

    SHARED VIRTUAL ADDRESS SPACE FOR HETEROGENEOUS PROCESSORS
    5.
    发明申请
    SHARED VIRTUAL ADDRESS SPACE FOR HETEROGENEOUS PROCESSORS 审中-公开
    用于异构处理器的共享虚拟地址空间

    公开(公告)号:US20160378674A1

    公开(公告)日:2016-12-29

    申请号:US14747944

    申请日:2015-06-23

    Abstract: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.

    Abstract translation: 处理器对处理器的异构处理单元使用相同的虚拟地址空间。 处理器对不同类型的处理单元(例如CPU和GPU)采用不同的页表,其中存储器管理单元使用每组页表来将虚拟地址空间的虚拟地址转换为存储器模块的相应物理地址 与处理器相关联。 随着数据在内存模块之间迁移,可以更新页表中的物理地址,以反映每个处理单元的数据的物理位置。

    Hybrid Render with Deferred Primitive Batch Binning
    6.
    发明申请
    Hybrid Render with Deferred Primitive Batch Binning 审中-公开
    混合渲染与延迟原始批量分类

    公开(公告)号:US20140292756A1

    公开(公告)日:2014-10-02

    申请号:US13853422

    申请日:2013-03-29

    CPC classification number: G06T15/005

    Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.

    Abstract translation: 提供了一种系统,方法和计算机程序产品,用于具有延迟原始批次分组的混合渲染。 从原始序列生成原始批次。 初始批次拦截中的原始字符串标识。 识别用于处理的仓。 该箱对应于屏幕空间的一个区域。 处理识别的仓的图元的像素。 识别旁边的截距,同时处理拦截识别的bin的原语。

    CONFIGURABLE MULTIPLE-DIE GRAPHICS PROCESSING UNIT

    公开(公告)号:US20240193844A1

    公开(公告)日:2024-06-13

    申请号:US18077424

    申请日:2022-12-08

    CPC classification number: G06T15/005 G06F9/3802

    Abstract: A graphics processing unit (GPU) of a processing system is partitioned into multiple dies (referred to as GPU chiplets) that are configurable to collectively function and interface with an application as a single GPU in a first mode and as multiple GPUs in a second mode. By dividing the GPU into multiple GPU chiplets, the processing system flexibly and cost-effectively configures an amount of active GPU physical resources based on an operating mode. In addition, a configurable number of GPU chiplets are assembled into a single GPU, such that multiple different GPUs having different numbers of GPU chiplets can be assembled using a small number of tape-outs and a multiple-die GPU can be constructed out of GPU chiplets that implement varying generations of technology.

    VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES

    公开(公告)号:US20230195626A1

    公开(公告)日:2023-06-22

    申请号:US17558008

    申请日:2021-12-21

    CPC classification number: G06F12/0806 G06F12/10 G06F2212/1016

    Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.

    Dynamic modification of coherent atomic memory operations

    公开(公告)号:US11604737B1

    公开(公告)日:2023-03-14

    申请号:US17516860

    申请日:2021-11-02

    Abstract: A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.

Patent Agency Ranking