Content addressable memory with sub-field minimum and maximum clamping

    公开(公告)号:US11537319B2

    公开(公告)日:2022-12-27

    申请号:US16710563

    申请日:2019-12-11

    Abstract: A processing system includes a content addressable memory (CAM) in an input/output path to selectively modify register writes on a per-pipeline basis. The CAM compares an address of a register write to an address field of each entry of the CAM. If a match is found, the CAM modifies the register write data as defined by a function for the matching entry of the CAM. In some embodiments, each entry of the CAM includes a data mask defining subfields of the register write data, wherein each subfield includes subfield data including one or more bits.

    Hybrid library latch array
    353.
    发明授权

    公开(公告)号:US11527270B2

    公开(公告)日:2022-12-13

    申请号:US17359253

    申请日:2021-06-25

    Abstract: A static random access memory (SRAM) includes fast SRAM bit cells and fast multiplexer circuits that are formed in a first row of fast cells in a hybrid standard cell architecture. Slow SRAM bit cells and slow multiplexer circuits are formed in a second row of slow cells. The slow multiplexer circuits provide a column output for the fast SRAM bit cells and the fast multiplexer circuits provide a column output for the slow SRAM bit cells. Thus, one SRAM column has fast bit cells and slow multiplexer stages while the adjacent SRAM column has slow bit cells and fast multiplexer stages to thereby provide an improved performance balance when reading the SRAM.

    Residency map descriptors
    354.
    发明授权

    公开(公告)号:US11521342B2

    公开(公告)日:2022-12-06

    申请号:US17230140

    申请日:2021-04-14

    Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.

    METHOD AND APPARATUS FOR EFFICIENT PROGRAMMABLE INSTRUCTIONS IN COMPUTER SYSTEMS

    公开(公告)号:US20220382550A1

    公开(公告)日:2022-12-01

    申请号:US17886855

    申请日:2022-08-12

    Inventor: Andrew G. Kegel

    Abstract: Systems, apparatuses, and methods for implementing as part of a processor pipeline a reprogrammable execution unit capable of executing specialized instructions are disclosed. A processor includes one or more reprogrammable execution units which can be programmed to execute different types of customized instructions. When the processor loads a program for execution, the processor loads a bitfile associated with the program. The processor programs a reprogrammable execution unit with the bitfile so that the reprogrammable execution unit is capable of executing specialized instructions associated with the program. During execution, a dispatch unit dispatches the specialized instructions to the reprogrammable execution unit for execution. The results of other instructions, such as integer and floating point instructions, are available immediately to instructions executing on the reprogrammable execution unit since the reprogrammable execution unit shares the processor registers with the integer and floating point execution units.

    Compressing micro-operations in scheduler entries in a processor

    公开(公告)号:US11513802B2

    公开(公告)日:2022-11-29

    申请号:US17033883

    申请日:2020-09-27

    Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.

    Throttling hull shaders based on tessellation factors in a graphics pipeline

    公开(公告)号:US11508124B2

    公开(公告)日:2022-11-22

    申请号:US17121965

    申请日:2020-12-15

    Inventor: Nishank Pathak

    Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.

    Temperature-based adjustments for in-memory matrix multiplication

    公开(公告)号:US11507641B2

    公开(公告)日:2022-11-22

    申请号:US16428903

    申请日:2019-05-31

    Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

    Memory request priority assignment techniques for parallel processors

    公开(公告)号:US11507522B2

    公开(公告)日:2022-11-22

    申请号:US16706421

    申请日:2019-12-06

    Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

    Scalable region-based directory
    360.
    发明授权

    公开(公告)号:US11507517B2

    公开(公告)日:2022-11-22

    申请号:US17033212

    申请日:2020-09-25

    Abstract: Disclosed is a cache directory including one or more cache directories configurable to interchange within each cache directory entry at least one bit between a first field and a second field to change the size of the region of memory represented and the number of cache lines tracked in the cache subsystem.

Patent Agency Ranking