-
公开(公告)号:US20230195626A1
公开(公告)日:2023-06-22
申请号:US17558008
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Guennadi Riguer , Mark Fowler , Randy Ramsey
IPC: G06F12/0806 , G06F12/10
CPC classification number: G06F12/0806 , G06F12/10 , G06F2212/1016
Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
-
公开(公告)号:US11604737B1
公开(公告)日:2023-03-14
申请号:US17516860
申请日:2021-11-02
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Joseph L. Greathouse , Steven Tony Tye , Mark Fowler , Milind N. Nemlekar
IPC: G06F12/00 , G06F12/0891 , G06F12/0831 , G06F9/448 , G06F9/30 , G06F12/0888
Abstract: A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.
-
公开(公告)号:US11521342B2
公开(公告)日:2022-12-06
申请号:US17230140
申请日:2021-04-14
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Maxim V. Kazakov , Mark Fowler
Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.
-
公开(公告)号:US20200042348A1
公开(公告)日:2020-02-06
申请号:US16050948
申请日:2018-07-31
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Anirudh R. Acharya , Michael J. Mantor , Rex Eldon McCrary , Anthony Asaro , Jeffrey Gongxian Cheng , Mark Fowler
Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
-
公开(公告)号:US10540802B1
公开(公告)日:2020-01-21
申请号:US16263986
申请日:2019-01-31
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Maxim V. Kazakov , Mark Fowler
Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.
-
公开(公告)号:US20190122417A1
公开(公告)日:2019-04-25
申请号:US16179376
申请日:2018-11-02
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Michael Mantor , Laurent Lefebvre , Mark Fowler , Timothy Kelley , Mikko Alho , Mika Tuomi , Kiia Kallio , Patrick Klas Rudolf Buss , Jari Antero Komppa , Kaj Tuomi
IPC: G06T15/00
Abstract: A system, method and a non-transitory computer readable storage medium are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from one or more primitives. A bin is identified for processing the primitive batch. At least a portion of each primitive intersecting the identified bin is processed and a next bin for processing the primitive batch is identified based on an intercept walk order. The processing is iteratively repeated for the one or more primitives in the primitive batch for successive bins until all primitives of the primitive batch are completely processed. Then, the one or more primitives in the primitive batch are further processed.
-
公开(公告)号:US20180246724A1
公开(公告)日:2018-08-30
申请号:US15442412
申请日:2017-02-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Mark Fowler , Brian D. Emberling
IPC: G06F9/30 , G06F12/0875
Abstract: Systems, apparatuses, and methods for maintaining separate pending load and store counters are disclosed herein. In one embodiment, a system includes at least one execution unit, a memory subsystem, and a pair of counters for each thread of execution. In one embodiment, the system implements a software based approach for managing dependencies between instructions. In one embodiment, the execution unit(s) maintains counters to support the software-based approach for managing dependencies between instructions. The execution unit(s) are configured to execute instructions that are used to manage the dependencies during run-time. In one embodiment, the execution unit(s) execute wait instructions to wait until a given counter is equal to a specified value before continuing to execute the instruction sequence.
-
公开(公告)号:US20180173649A1
公开(公告)日:2018-06-21
申请号:US15385566
申请日:2016-12-20
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Rostyslav Kyrychynskyi , Anthony Asaro , Kostantinos Danny Christidis , Mark Fowler , Michael J. Mantor , Robert Scott Hartog
CPC classification number: G06F13/161 , G06F13/1673 , G06F13/4068
Abstract: A system and method for efficient arbitration of memory access requests are described. One or more functional units generate memory access requests for a partitioned memory. An arbitration unit stores the generated requests and selects a given one of the stored requests. The arbitration unit identifies a given partition of the memory which stores a memory location targeted by the selected request. The arbitration unit determines whether one or more other stored requests access memory locations in the given partition. The arbitration unit sends each of the selected memory access request and the identified one or more other memory access requests to the memory to be serviced out of order.
-
公开(公告)号:US20180165221A1
公开(公告)日:2018-06-14
申请号:US15374788
申请日:2016-12-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Mark Fowler
IPC: G06F12/128 , G06F12/122
CPC classification number: G06F12/128 , G06F12/0888 , G06F12/122 , G06F12/126 , G06F2212/1024 , G06F2212/455 , G06F2212/621 , G06F2212/69 , G06F2212/70
Abstract: A system and method for efficiently performing data allocation in a cache memory are described. A lookup is performed in a cache responsive to detecting an access request. If the targeted data is found in the cache and the targeted data is of a no allocate data type indicating the targeted data is not expected to be reused, then the targeted data is read from the cache without updating cache replacement policy information for the targeted data responsive to the access. If the lookup results in a miss, to the targeted data is prevented from being allocated in the cache.
-
公开(公告)号:US20250004653A1
公开(公告)日:2025-01-02
申请号:US18756976
申请日:2024-06-27
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Mark Fowler , Anthony Asaro , Vydhyanathan Kalyanasundharam
IPC: G06F3/06
Abstract: A processing system including a parallel processing unit selectively allocating pages of memory for interleaving across configurable subsets of channels based on a mode of allocation. In some embodiments, in a first mode, a page of memory is allocated to and interleaved across a plurality of channels, and in a second mode, a page of memory is allocated to and interleaved across a subset of the plurality of channels.
-
-
-
-
-
-
-
-
-