-
公开(公告)号:US12026380B2
公开(公告)日:2024-07-02
申请号:US17854903
申请日:2022-06-30
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Mark Fowler , Anthony Asaro , Vydhyanathan Kalyanasundharam
IPC: G06F3/06
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/0679
Abstract: A processing system including a parallel processing unit selectively allocating pages of memory for interleaving across configurable subsets of channels based on a mode of allocation. In some embodiments, in a first mode, a page of memory is allocated to and interleaved across a plurality of channels, and in a second mode, a page of memory is allocated to and interleaved across a subset of the plurality of channels.
-
公开(公告)号:US11461045B2
公开(公告)日:2022-10-04
申请号:US16370035
申请日:2019-03-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mark Fowler
Abstract: A processing unit is configured to access a first memory that supports atomic operations and a second memory via an interface. The second memory or the interface does not support atomicity of the atomic operations. A trap handler is configured to trap atomic operations and enforce atomicity of the trapped atomic operations. The processing unit selectively provides atomic operations to the trap handler in response to detecting that memory access requests in the atomic operations are directed to the second memory via the interface. In some cases, the processing unit detects a frequency of traps that result from atomic operations that include memory access requests to a page stored in the second memory. The processing unit transfers the page from the second memory to the first memory in response to the trap frequency exceeding a threshold.
-
公开(公告)号:US10403333B2
公开(公告)日:2019-09-03
申请号:US15211887
申请日:2016-07-15
Applicant: Advanced Micro Devices, Inc.
Inventor: Kevin M. Brandl , Thomas Hamilton , Hideki Kanayama , Kedarnath Balakrishnan , James R. Magro , Guanhao Shen , Mark Fowler
IPC: G11C7/10 , G06F12/1018 , G11C11/408
Abstract: A memory controller includes a host interface for receiving memory access requests including access addresses, a memory interface for providing memory accesses to a memory system, and an address decoder coupled to the host interface for programmably mapping the access addresses to selected ones of a plurality of regions. The address decoder is programmable to map the access addresses to a first region having a non-power-of-two size using a primary decoder and a secondary decoder each having power-of-two sizes, and providing a first region mapping signal in response. A command queue stores the memory access requests and region mapping signals. An arbiter picks the memory access requests from the command queue based on a plurality of criteria, which are evaluated based in part on the region mapping signals, and provides corresponding memory accesses to the memory interface in response.
-
公开(公告)号:US20180165872A1
公开(公告)日:2018-06-14
申请号:US15374752
申请日:2016-12-09
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Laurent Lefebvre , Michael Mantor , Mark Fowler , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi , Christopher J. Brennan
CPC classification number: G06T15/405 , G06T11/40 , G06T15/005 , G06T15/80
Abstract: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.
-
5.
公开(公告)号:US20160378674A1
公开(公告)日:2016-12-29
申请号:US14747944
申请日:2015-06-23
Applicant: ATI Technologies ULC , Advanced Micro Devices, Inc.
Inventor: Gongxian Jeffrey Cheng , Mark Fowler , Philip J. Rogers , Benjamin T. Sander , Anthony Asaro , Mike Mantor , Raja Koduri
IPC: G06F12/10
CPC classification number: G06F12/1009 , G06F12/1072 , G06F12/1081 , G06F15/163 , G06F2212/1016 , G06F2212/151 , G06F2212/152 , G06F2212/251
Abstract: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.
Abstract translation: 处理器对处理器的异构处理单元使用相同的虚拟地址空间。 处理器对不同类型的处理单元(例如CPU和GPU)采用不同的页表,其中存储器管理单元使用每组页表来将虚拟地址空间的虚拟地址转换为存储器模块的相应物理地址 与处理器相关联。 随着数据在内存模块之间迁移,可以更新页表中的物理地址,以反映每个处理单元的数据的物理位置。
-
公开(公告)号:US20140292756A1
公开(公告)日:2014-10-02
申请号:US13853422
申请日:2013-03-29
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Michael MANTOR , Laurent Lefebvre , Mark Fowler , Timothy Kelley , Mikko Alho , Mika Tuomi , Kallio Kia , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi
CPC classification number: G06T15/005
Abstract: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.
Abstract translation: 提供了一种系统,方法和计算机程序产品,用于具有延迟原始批次分组的混合渲染。 从原始序列生成原始批次。 初始批次拦截中的原始字符串标识。 识别用于处理的仓。 该箱对应于屏幕空间的一个区域。 处理识别的仓的图元的像素。 识别旁边的截距,同时处理拦截识别的bin的原语。
-
公开(公告)号:US20240193844A1
公开(公告)日:2024-06-13
申请号:US18077424
申请日:2022-12-08
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mark Fowler , Samuel Naffziger , Michael Mantor , Mark Leather
CPC classification number: G06T15/005 , G06F9/3802
Abstract: A graphics processing unit (GPU) of a processing system is partitioned into multiple dies (referred to as GPU chiplets) that are configurable to collectively function and interface with an application as a single GPU in a first mode and as multiple GPUs in a second mode. By dividing the GPU into multiple GPU chiplets, the processing system flexibly and cost-effectively configures an amount of active GPU physical resources based on an operating mode. In addition, a configurable number of GPU chiplets are assembled into a single GPU, such that multiple different GPUs having different numbers of GPU chiplets can be assembled using a small number of tape-outs and a multiple-die GPU can be constructed out of GPU chiplets that implement varying generations of technology.
-
公开(公告)号:US20240135626A1
公开(公告)日:2024-04-25
申请号:US18402315
申请日:2024-01-02
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Michael Mantor , Laurent Lefebvre , Mark Fowler , Timothy Kelley , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi
IPC: G06T15/00
CPC classification number: G06T15/005
Abstract: A method, computer system, and a non-transitory computer-readable storage medium for performing primitive batch binning are disclosed. The method, computer system, and non-transitory computer-readable storage medium include techniques for generating a primitive batch from a plurality of primitives, computing respective bin intercepts for each of the plurality of primitives in the primitive batch, and shading the primitive batch by iteratively processing each of the respective bin intercepts computed until all of the respective bin intercepts are processed.
-
公开(公告)号:US20230195626A1
公开(公告)日:2023-06-22
申请号:US17558008
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Guennadi Riguer , Mark Fowler , Randy Ramsey
IPC: G06F12/0806 , G06F12/10
CPC classification number: G06F12/0806 , G06F12/10 , G06F2212/1016
Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
-
公开(公告)号:US11604737B1
公开(公告)日:2023-03-14
申请号:US17516860
申请日:2021-11-02
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Joseph L. Greathouse , Steven Tony Tye , Mark Fowler , Milind N. Nemlekar
IPC: G06F12/00 , G06F12/0891 , G06F12/0831 , G06F9/448 , G06F9/30 , G06F12/0888
Abstract: A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.
-
-
-
-
-
-
-
-
-