-
公开(公告)号:US11204871B2
公开(公告)日:2021-12-21
申请号:US14755401
申请日:2015-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Zhe Wang , Sooraj Puthoor , Bradford M. Beckmann
IPC: G06F12/08 , G06F12/084
Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.
-
公开(公告)号:US20210390998A1
公开(公告)日:2021-12-16
申请号:US16902204
申请日:2020-06-15
Applicant: Advanced Micro Devices, Inc.
Inventor: SeyedMohammad SeyedzadehDelcheh
IPC: G11C11/4078 , G06F12/121 , G06F12/0862 , G11C11/408
Abstract: A method includes adding a set of one or more victim rows to a first probabilistic filter and to a second probabilistic filter, in response to a memory access request, identifying a candidate victim row adjacent to a memory address specified by a memory access request, identifying the candidate victim row as a victim row in the set of victim rows based on performing a lookup of the candidate victim row in a selected filter, where the selected filter includes one of the first probabilistic filter and the second probabilistic filter, in response to identifying the candidate row as the victim row, enabling a row hammering countermeasure, clearing the first probabilistic filter in each of a first set of time periods, and clearing the second probabilistic filter in each of a second set of time periods interleaved with the first set of time periods.
-
公开(公告)号:US11200724B2
公开(公告)日:2021-12-14
申请号:US15853207
申请日:2017-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Skyler Jonathon Saleh , Maxim V. Kazakov , Vineet Goel
Abstract: A texture processor based ray tracing accelerator method and system are described. The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.
-
公开(公告)号:US20210383528A1
公开(公告)日:2021-12-09
申请号:US17030254
申请日:2020-09-23
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Nicholas Malaya , Max Kiehn , Stanislav Ivashkevich
Abstract: A technique for detecting a glitch in an image is provided. The technique includes providing an image to a plurality of individual classifiers to generate a plurality of individual classifier outputs and providing the plurality of individual classifier outputs to an ensemble classifier to generate a glitch classification.
-
公开(公告)号:US20210382718A1
公开(公告)日:2021-12-09
申请号:US16895825
申请日:2020-06-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos
Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.
-
公开(公告)号:US11196657B2
公开(公告)日:2021-12-07
申请号:US15850616
申请日:2017-12-21
Applicant: Advanced Micro Devices, Inc.
IPC: H04L12/751 , G06F13/42 , G06F13/364 , H04L12/933 , H04L12/741 , G06F13/40 , G06F16/901 , G06F16/9038 , H04L29/06 , G06F21/57
Abstract: A system for automatically discovering fabric topology includes at least one or more processing units, one or more memory devices, a security processor, and a communication fabric with an unknown topology coupled to the processing unit(s), memory device(s), and security processor. The security processor queries each component of the fabric to retrieve various attributes associated with the component. The security processor utilizes the retrieved attributes to create a network graph of the topology of the components within the fabric. The security processor generates routing tables from the network graph and programs the routing tables into the fabric components. Then, the fabric components utilize the routing tables to determine how to route incoming packets.
-
公开(公告)号:US11195326B2
公开(公告)日:2021-12-07
申请号:US16137830
申请日:2018-09-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Ruijin Wu , Young In Yeo , Sagar S. Bhandare , Vineet Goel , Martin G. Sarov , Christopher J. Brennan
Abstract: Described herein are techniques for improving the effectiveness of depth culling. In a first technique, a binner is used to sort primitives into depth bins. Each depth bin covers a range of depths. The binner transmits the depth bins to the screen space pipeline for processing in near-to-far order. Processing the near bins first results in the depth buffer being updated, allowing fragments for the primitives in the farther bins to be culled more aggressively than if the depth binning did not occur. In a second technique, a buffer is used to initiate two-pass processing through the screen space pipeline. In the first pass, primitives are sent down to update the depth block and are then culled. The fragments are processed normally in the second pass, with the benefit of the updated depth values.
-
公开(公告)号:US11194583B2
公开(公告)日:2021-12-07
申请号:US16658688
申请日:2019-10-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Krishnan V. Ramani
Abstract: Speculative execution using a page-level tracked load order queue includes: determining that a first load instruction targets a determined memory region; and in response to the first load instruction targeting the determined memory region, adding an entry to a page-level tracked load order queue instead of a load order queue, where the entry indicates a page address of a target of the first load instruction.
-
公开(公告)号:US20210374898A1
公开(公告)日:2021-12-02
申请号:US17318523
申请日:2021-05-12
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mangesh P. NIJASURE , Tad LITWILLER , Todd MARTIN , Nishank PATHAK
Abstract: A graphics pipeline reduces the number of tessellation factors written to and read from a graphics memory. A hull shader stage of the graphics pipeline detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same and, in some embodiments, whether at least the threshold percentage of the tessellation factors for a thread group of patches have a same value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline. In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same (or, additionally, that at least the threshold percentage of the tessellation factors have a value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline), the hull shader stage bypasses writing at least a subset of the tessellation factors for the thread group of patches to the graphics memory, thus reducing bandwidth and increasing efficiency of the graphics pipeline.
-
公开(公告)号:US20210373975A1
公开(公告)日:2021-12-02
申请号:US17029935
申请日:2020-09-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Alexandru DUTU , Matthew David SINCLAIR , Bradford BECKMANN , David A. WOOD
Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.
-
-
-
-
-
-
-
-
-