ACCELERATING PREDICATED INSTRUCTION EXECUTION IN VECTOR PROCESSORS

    公开(公告)号:US20240004656A1

    公开(公告)日:2024-01-04

    申请号:US17853790

    申请日:2022-06-29

    CPC classification number: G06F9/30145 G06F9/3851 G06F9/3887

    Abstract: Methods and systems are disclosed for processing a vector by a vector processor. Techniques disclosed include receiving predicated instructions by a scheduler, each of which is associated with an opcode, a vector of elements, and a predicate. The techniques further include executing the predicated instructions. Executing a predicated instruction includes compressing, based on an index derived from a predicate of the instruction, elements in a vector of the instruction, where the elements in the vector are contiguously mapped, then, after the mapped elements are processed, decompressing the processed mapped elements, where the processed mapped elements are reverse mapped based on the index.

    Memory request priority assignment techniques for parallel processors

    公开(公告)号:US11507522B2

    公开(公告)日:2022-11-22

    申请号:US16706421

    申请日:2019-12-06

    Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

    Adaptive cache reconfiguration via clustering

    公开(公告)号:US11360891B2

    公开(公告)日:2022-06-14

    申请号:US16355168

    申请日:2019-03-15

    Abstract: A method of dynamic cache configuration includes determining, for a first clustering configuration, whether a current cache miss rate exceeds a miss rate threshold. The first clustering configuration includes a plurality of graphics processing unit (GPU) compute units clustered into a first plurality of compute unit clusters. The method further includes clustering, based on the current cache miss rate exceeding the miss rate threshold, the plurality of GPU compute units into a second clustering configuration having a second plurality of compute unit clusters fewer than the first plurality of compute unit clusters.

    MEMORY ACCESS RESPONSE MERGING IN A MEMORY HIERARCHY

    公开(公告)号:US20220091980A1

    公开(公告)日:2022-03-24

    申请号:US17031706

    申请日:2020-09-24

    Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.

    Mechanism for distributed-system-aware difference encoding/decoding in graph analytics

    公开(公告)号:US11068458B2

    公开(公告)日:2021-07-20

    申请号:US16202082

    申请日:2018-11-27

    Abstract: A portion of a graph dataset is generated for each computing node in a distributed computing system by, for each subject vertex in a graph, recording for the computing node an offset for the subject vertex, where the offset references a first position in an edge array for the computing node, and for each edge of a set of edges coupled with the subject vertex in the graph, calculating an edge value for the edge based on a connected vertex identifier identifying a vertex coupled with the subject vertex via the edge. When the edge value is assigned to the first position, the edge value is determined by a first calculation, and when the edge value is assigned to position subsequent to the first position, the edge value is determined by a second calculation. In the computing node, the edge value is recorded in the edge array.

Patent Agency Ranking