-
公开(公告)号:US20180165790A1
公开(公告)日:2018-06-14
申请号:US15377998
申请日:2016-12-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Daniel Schneider , Fataneh Ghodrat
IPC: G06T1/60 , G06F12/0877 , G06F12/0815 , G06F15/80
CPC classification number: G06T1/60 , G06F12/0815 , G06F12/0877 , G06F15/8007 , G06F2212/455 , G06F2212/60 , G06F2212/621 , G06T1/20 , G06T15/005
Abstract: Techniques for allowing cache access returns out of order are disclosed. A return ordering queue exists for each of several cache access types and stores outstanding cache accesses in the order in which those accesses were made. When a cache access request for a particular type is at the head of the return ordering queue for that type and the cache access is available for return to the wavefront that made that access, the cache system returns the cache access to the wavefront. Thus, cache accesses can be returned out of order with respect to cache accesses of different types. Allowing out-of-order returns can help to improve latency, for example in the situation where a relatively low-latency access type (e.g., a read) is issued after a relatively high-latency access type (e.g., a texture sampler operation).
-
公开(公告)号:US20180165314A1
公开(公告)日:2018-06-14
申请号:US15824771
申请日:2017-11-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Steven R. Havlir , Patrick J. Shyvers
IPC: G06F17/30
CPC classification number: G06F16/2264 , G06F9/3844 , G06F16/2246 , G06F16/2255
Abstract: Described herein is a system and method for multiplexer tree (muxtree) indexing. Muxtree indexing performs hashing and row reduction in parallel by use of each select bit only once in a particular path of the muxtree. The muxtree indexing generates a different final index as compared to conventional hashed indexing but still results in a fair hash, where all table entries get used with equal distribution with uniformly random selects.
-
公开(公告)号:US20180165221A1
公开(公告)日:2018-06-14
申请号:US15374788
申请日:2016-12-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Mark Fowler
IPC: G06F12/128 , G06F12/122
CPC classification number: G06F12/128 , G06F12/0888 , G06F12/122 , G06F12/126 , G06F2212/1024 , G06F2212/455 , G06F2212/621 , G06F2212/69 , G06F2212/70
Abstract: A system and method for efficiently performing data allocation in a cache memory are described. A lookup is performed in a cache responsive to detecting an access request. If the targeted data is found in the cache and the targeted data is of a no allocate data type indicating the targeted data is not expected to be reused, then the targeted data is read from the cache without updating cache replacement policy information for the targeted data responsive to the access. If the lookup results in a miss, to the targeted data is prevented from being allocated in the cache.
-
公开(公告)号:US20180165214A1
公开(公告)日:2018-06-14
申请号:US15377537
申请日:2016-12-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Amin Farmahini Farahani , David A. Roberts
IPC: G06F12/0888 , G06F12/0806
Abstract: A processing system fills a memory access request for data from a processor core by bypassing a cache when a write congestion condition is detected, and when transferring the data to the cache would cause eviction of a dirty cache line. The cache is bypassed by transferring the requested data to the processor core or to a different cache. Accordingly, the processing system can temporarily bypass the cache storing the dirty cache line when filling a memory access request, thereby avoiding the eviction and write back to main memory of a dirty cache line when a write congestion condition exists.
-
公开(公告)号:US09990203B2
公开(公告)日:2018-06-05
申请号:US14981310
申请日:2015-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Leonardo de Paula Rosa Piga , Abhinandan Majumdar , Indrani Paul , Wei Huang , Manish Arora , Joseph L. Greathouse
IPC: G06F9/30
CPC classification number: G06F9/30192 , G06F9/30014 , G06F9/30083 , G06F9/30145 , G06F11/00
Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.
-
公开(公告)号:US09983655B2
公开(公告)日:2018-05-29
申请号:US14963352
申请日:2015-12-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Mitesh R. Meswani , David A. Roberts , Dmitri Yudanov , Arkaprava Basu , Sergey Blagodurov
CPC classification number: G06F1/3243 , G06F9/3885
Abstract: A method and apparatus for performing inter-lane power management includes de-energizing one or more execution lanes upon a determination that the one or more execution lanes are to be predicated. Energy from the predicated execution lanes is redistributed to one or more active execution lanes.
-
公开(公告)号:US09983652B2
公开(公告)日:2018-05-29
申请号:US14959669
申请日:2015-12-04
Applicant: Advanced Micro Devices, Inc.
Inventor: Leonardo Piga , Indrani Paul , Wei Huang
CPC classification number: G06F1/3203 , G06F1/3206 , G06F1/3287 , Y02D10/171
Abstract: Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agent may be configured to reassign the unused power to the active nodes to expedite workload processing.
-
公开(公告)号:US20180143907A1
公开(公告)日:2018-05-24
申请号:US15360205
申请日:2016-11-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Daniel Clifton , Michael J. Mantor , Hans Burton
IPC: G06F12/0846 , G06F9/38
CPC classification number: G06F12/0848 , G06F9/3887 , G06F9/5077 , G06F9/526 , G06F2212/282
Abstract: A system and method for efficiently processing access requests for a shared resource are described. Each of many requestors are assigned to a partition of a shared resource. When a controller determines no requestor generates an access request for an unassigned partition, the controller permits simultaneous access to the assigned partitions for active requestors. When the controller determines at least one active requestor generates an access request for an unassigned partition, the controller allows a single active requestor to gain exclusive access to the entire shared resource while stalling access for the other active requestors. The controller alternatives exclusive access among the active requestors. In various embodiments, the shared resource is a local data store in a graphics processing unit and each of the multiple requestors is a single instruction multiple data (SIMD) compute unit.
-
公开(公告)号:US20180143781A1
公开(公告)日:2018-05-24
申请号:US15360518
申请日:2016-11-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Joseph L. Greathouse , Christopher D. Erb , Michael G. Collins
CPC classification number: G06F3/0647 , G06F3/0619 , G06F3/0656 , G06F3/0685 , G06F9/4443 , G06F9/451 , G06T1/20 , G06T1/60
Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.
-
公开(公告)号:US09977609B2
公开(公告)日:2018-05-22
申请号:US15063186
申请日:2016-03-07
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan S. Jayasena , Dong Ping Zhang , Paula Aguilera Diez
CPC classification number: G06F3/0613 , G06F3/0658 , G06F3/0673 , G06F12/084 , G06F12/0888 , G06F12/10 , G06F2212/1024
Abstract: Systems, apparatuses, and methods for implementing efficient queues and other data structures. A queue may be shared among multiple processors and/or threads without using explicit software atomic instructions to coordinate access to the queue. System software may allocate an atomic queue and corresponding queue metadata in system memory and return, to the requesting thread, a handle referencing the queue metadata. Any number of threads may utilize the handle for accessing the atomic queue. The logic for ensuring the atomicity of accesses to the atomic queue may reside in a management unit in the memory controller coupled to the memory where the atomic queue is allocated.
-
-
-
-
-
-
-
-
-