System performance management using prioritized compute units

    公开(公告)号:US11204871B2

    公开(公告)日:2021-12-21

    申请号:US14755401

    申请日:2015-06-30

    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

    METHOD AND APPARATUS FOR WORDLINE CROSSTALK MITIGATION IN DEEPLY-SCALED DRAM

    公开(公告)号:US20210390998A1

    公开(公告)日:2021-12-16

    申请号:US16902204

    申请日:2020-06-15

    Abstract: A method includes adding a set of one or more victim rows to a first probabilistic filter and to a second probabilistic filter, in response to a memory access request, identifying a candidate victim row adjacent to a memory address specified by a memory access request, identifying the candidate victim row as a victim row in the set of victim rows based on performing a lookup of the candidate victim row in a selected filter, where the selected filter includes one of the first probabilistic filter and the second probabilistic filter, in response to identifying the candidate row as the victim row, enabling a row hammering countermeasure, clearing the first probabilistic filter in each of a first set of time periods, and clearing the second probabilistic filter in each of a second set of time periods interleaved with the first set of time periods.

    Texture processor based ray tracing acceleration method and system

    公开(公告)号:US11200724B2

    公开(公告)日:2021-12-14

    申请号:US15853207

    申请日:2017-12-22

    Abstract: A texture processor based ray tracing accelerator method and system are described. The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.

    Controlling Prediction Functional Blocks Used by a Branch Predictor in a Processor

    公开(公告)号:US20210382718A1

    公开(公告)日:2021-12-09

    申请号:US16895825

    申请日:2020-06-08

    Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.

    Method and system for depth pre-processing and geometry sorting using binning hardware

    公开(公告)号:US11195326B2

    公开(公告)日:2021-12-07

    申请号:US16137830

    申请日:2018-09-21

    Abstract: Described herein are techniques for improving the effectiveness of depth culling. In a first technique, a binner is used to sort primitives into depth bins. Each depth bin covers a range of depths. The binner transmits the depth bins to the screen space pipeline for processing in near-to-far order. Processing the near bins first results in the depth buffer being updated, allowing fragments for the primitives in the farther bins to be culled more aggressively than if the depth binning did not occur. In a second technique, a buffer is used to initiate two-pass processing through the screen space pipeline. In the first pass, primitives are sent down to update the depth block and are then culled. The fragments are processed normally in the second pass, with the benefit of the updated depth values.

    Speculative execution using a page-level tracked load order queue

    公开(公告)号:US11194583B2

    公开(公告)日:2021-12-07

    申请号:US16658688

    申请日:2019-10-21

    Abstract: Speculative execution using a page-level tracked load order queue includes: determining that a first load instruction targets a determined memory region; and in response to the first load instruction targeting the determined memory region, adding an entry to a page-level tracked load order queue instead of a load order queue, where the entry indicates a page address of a target of the first load instruction.

    REDUCED BANDWIDTH TESSELLATION FACTORS

    公开(公告)号:US20210374898A1

    公开(公告)日:2021-12-02

    申请号:US17318523

    申请日:2021-05-12

    Abstract: A graphics pipeline reduces the number of tessellation factors written to and read from a graphics memory. A hull shader stage of the graphics pipeline detects whether at least a threshold percentage of the tessellation factors for a thread group of patches are the same and, in some embodiments, whether at least the threshold percentage of the tessellation factors for a thread group of patches have a same value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline. In response to detecting that at least the threshold percentage of the tessellation factors for the thread group are the same (or, additionally, that at least the threshold percentage of the tessellation factors have a value that either indicates that the plurality of patches are to be culled or that the plurality of patches are to be passed to a tessellator stage of the graphics pipeline), the hull shader stage bypasses writing at least a subset of the tessellation factors for the thread group of patches to the graphics memory, thus reducing bandwidth and increasing efficiency of the graphics pipeline.

    WORKGROUP SYNCHRONIZATION AND PROCESSING

    公开(公告)号:US20210373975A1

    公开(公告)日:2021-12-02

    申请号:US17029935

    申请日:2020-09-23

    Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.

Patent Agency Ranking