Compressing micro-operations in scheduler entries in a processor

    公开(公告)号:US11513802B2

    公开(公告)日:2022-11-29

    申请号:US17033883

    申请日:2020-09-27

    Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.

    Throttling hull shaders based on tessellation factors in a graphics pipeline

    公开(公告)号:US11508124B2

    公开(公告)日:2022-11-22

    申请号:US17121965

    申请日:2020-12-15

    Inventor: Nishank Pathak

    Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.

    Temperature-based adjustments for in-memory matrix multiplication

    公开(公告)号:US11507641B2

    公开(公告)日:2022-11-22

    申请号:US16428903

    申请日:2019-05-31

    Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

    Memory request priority assignment techniques for parallel processors

    公开(公告)号:US11507522B2

    公开(公告)日:2022-11-22

    申请号:US16706421

    申请日:2019-12-06

    Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

    Scalable region-based directory
    365.
    发明授权

    公开(公告)号:US11507517B2

    公开(公告)日:2022-11-22

    申请号:US17033212

    申请日:2020-09-25

    Abstract: Disclosed is a cache directory including one or more cache directories configurable to interchange within each cache directory entry at least one bit between a first field and a second field to change the size of the region of memory represented and the number of cache lines tracked in the cache subsystem.

    DATA STRUCTURE ENGINE
    366.
    发明申请

    公开(公告)号:US20220365725A1

    公开(公告)日:2022-11-17

    申请号:US17741403

    申请日:2022-05-10

    Abstract: A method includes receiving from a compute element a command for performing a requested operation on data stored in a memory device, and in response to receiving the command, performing the requested operation by generating a plurality of memory access requests based on the command and issuing the plurality of memory access requests to the memory device.

    Prefetch kernels on data-parallel processors

    公开(公告)号:US11500778B2

    公开(公告)日:2022-11-15

    申请号:US16813075

    申请日:2020-03-09

    Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.

    WRITE MASKED LATCH BIT CELL
    368.
    发明申请

    公开(公告)号:US20220358996A1

    公开(公告)日:2022-11-10

    申请号:US17359254

    申请日:2021-06-25

    Abstract: A write masked latch bit cell of an SRAM includes a write mask circuit that is responsive to assertion of a first write mask signal to cause a value of a write data node to be a first value and is responsive to assertion of a second write mask signal to cause the value of the write data node to have a second value. A pass gate supplies the data on the write data node to an internal node of the bit cell responsive to write word line signals being asserted. A keeper circuit maintains the value of the first node independently of values of the write word line signals while the first write mask signal and the second write mask signal are deasserted.

Patent Agency Ranking