-
公开(公告)号:US12026380B2
公开(公告)日:2024-07-02
申请号:US17854903
申请日:2022-06-30
IPC分类号: G06F3/06
CPC分类号: G06F3/0631 , G06F3/0604 , G06F3/0679
摘要: A processing system including a parallel processing unit selectively allocating pages of memory for interleaving across configurable subsets of channels based on a mode of allocation. In some embodiments, in a first mode, a page of memory is allocated to and interleaved across a plurality of channels, and in a second mode, a page of memory is allocated to and interleaved across a subset of the plurality of channels.
-
公开(公告)号:US11461045B2
公开(公告)日:2022-10-04
申请号:US16370035
申请日:2019-03-29
发明人: Mark Fowler
摘要: A processing unit is configured to access a first memory that supports atomic operations and a second memory via an interface. The second memory or the interface does not support atomicity of the atomic operations. A trap handler is configured to trap atomic operations and enforce atomicity of the trapped atomic operations. The processing unit selectively provides atomic operations to the trap handler in response to detecting that memory access requests in the atomic operations are directed to the second memory via the interface. In some cases, the processing unit detects a frequency of traps that result from atomic operations that include memory access requests to a page stored in the second memory. The processing unit transfers the page from the second memory to the first memory in response to the trap frequency exceeding a threshold.
-
公开(公告)号:US10403333B2
公开(公告)日:2019-09-03
申请号:US15211887
申请日:2016-07-15
发明人: Kevin M. Brandl , Thomas Hamilton , Hideki Kanayama , Kedarnath Balakrishnan , James R. Magro , Guanhao Shen , Mark Fowler
IPC分类号: G11C7/10 , G06F12/1018 , G11C11/408
摘要: A memory controller includes a host interface for receiving memory access requests including access addresses, a memory interface for providing memory accesses to a memory system, and an address decoder coupled to the host interface for programmably mapping the access addresses to selected ones of a plurality of regions. The address decoder is programmable to map the access addresses to a first region having a non-power-of-two size using a primary decoder and a secondary decoder each having power-of-two sizes, and providing a first region mapping signal in response. A command queue stores the memory access requests and region mapping signals. An arbiter picks the memory access requests from the command queue based on a plurality of criteria, which are evaluated based in part on the region mapping signals, and provides corresponding memory accesses to the memory interface in response.
-
公开(公告)号:US20180165872A1
公开(公告)日:2018-06-14
申请号:US15374752
申请日:2016-12-09
发明人: Laurent Lefebvre , Michael Mantor , Mark Fowler , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi , Christopher J. Brennan
CPC分类号: G06T15/405 , G06T11/40 , G06T15/005 , G06T15/80
摘要: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.
-
公开(公告)号:US20160378674A1
公开(公告)日:2016-12-29
申请号:US14747944
申请日:2015-06-23
发明人: Gongxian Jeffrey Cheng , Mark Fowler , Philip J. Rogers , Benjamin T. Sander , Anthony Asaro , Mike Mantor , Raja Koduri
IPC分类号: G06F12/10
CPC分类号: G06F12/1009 , G06F12/1072 , G06F12/1081 , G06F15/163 , G06F2212/1016 , G06F2212/151 , G06F2212/152 , G06F2212/251
摘要: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.
摘要翻译: 处理器对处理器的异构处理单元使用相同的虚拟地址空间。 处理器对不同类型的处理单元(例如CPU和GPU)采用不同的页表,其中存储器管理单元使用每组页表来将虚拟地址空间的虚拟地址转换为存储器模块的相应物理地址 与处理器相关联。 随着数据在内存模块之间迁移,可以更新页表中的物理地址,以反映每个处理单元的数据的物理位置。
-
公开(公告)号:US20140292756A1
公开(公告)日:2014-10-02
申请号:US13853422
申请日:2013-03-29
发明人: Michael MANTOR , Laurent Lefebvre , Mark Fowler , Timothy Kelley , Mikko Alho , Mika Tuomi , Kallio Kia , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi
CPC分类号: G06T15/005
摘要: A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.
摘要翻译: 提供了一种系统,方法和计算机程序产品,用于具有延迟原始批次分组的混合渲染。 从原始序列生成原始批次。 初始批次拦截中的原始字符串标识。 识别用于处理的仓。 该箱对应于屏幕空间的一个区域。 处理识别的仓的图元的像素。 识别旁边的截距,同时处理拦截识别的bin的原语。
-
公开(公告)号:US20220277508A1
公开(公告)日:2022-09-01
申请号:US17745410
申请日:2022-05-16
发明人: Michael Mantor , Laurent Lefebvre , Mark Fowler , Timothy Kelley , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi
IPC分类号: G06T15/00
摘要: A method, computer system, and a non-transitory computer-readable storage medium for performing primitive batch binning are disclosed. The method, computer system, and non-transitory computer-readable storage medium include techniques for generating a primitive batch from a plurality of primitives, computing respective bin intercepts for each of the plurality of primitives in the primitive batch, and shading the primitive batch by iteratively processing each of the respective bin intercepts computed until all of the respective bin intercepts are processed.
-
公开(公告)号:US11100004B2
公开(公告)日:2021-08-24
申请号:US14747944
申请日:2015-06-23
发明人: Gongxian Jeffrey Cheng , Mark Fowler , Philip J. Rogers , Benjamin T. Sander , Anthony Asaro , Mike Mantor , Raja Koduri
IPC分类号: G06F12/1009
摘要: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.
-
公开(公告)号:US11074075B2
公开(公告)日:2021-07-27
申请号:US15442412
申请日:2017-02-24
发明人: Mark Fowler , Brian D. Emberling
摘要: Systems, apparatuses, and methods for maintaining separate pending load and store counters are disclosed herein. In one embodiment, a system includes at least one execution unit, a memory subsystem, and a pair of counters for each thread of execution. In one embodiment, the system implements a software based approach for managing dependencies between instructions. In one embodiment, the execution unit(s) maintains counters to support the software-based approach for managing dependencies between instructions. The execution unit(s) are configured to execute instructions that are used to manage the dependencies during run-time. In one embodiment, the execution unit(s) execute wait instructions to wait until a given counter is equal to a specified value before continuing to execute the instruction sequence.
-
公开(公告)号:US10943389B2
公开(公告)日:2021-03-09
申请号:US15374752
申请日:2016-12-09
发明人: Laurent Lefebvre , Michael Mantor , Mark Fowler , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi , Christopher J. Brennan
摘要: Techniques for removing or identifying overlapping fragments in a fragment stream after z-culling are disclosed. The techniques include maintaining a first-in-first-out buffer that stores post-z-cull fragments. Each time a new fragment is received at the buffer, the screen position of the fragment is checked against all other fragments in the buffer. If the screen position of the fragment matches the screen position of a fragment in the buffer, then the fragment in the buffer is removed or marked as overlapping. If the screen position of the fragment does not match the screen position of any fragment in the buffer, then no modification is performed to fragments already in the buffer. In either case, he fragment is added to the buffer. The contents of the buffer are transmitted to the pixel shader for pixel shading at a later time.
-
-
-
-
-
-
-
-
-