DYNAMICALLY ADAPTABLE ARRAYS FOR VECTOR AND MATRIX OPERATIONS

    公开(公告)号:US20220100813A1

    公开(公告)日:2022-03-31

    申请号:US17032314

    申请日:2020-09-25

    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.

    PROBE INTERRUPT DELIVERY
    422.
    发明申请

    公开(公告)号:US20220100686A1

    公开(公告)日:2022-03-31

    申请号:US17548385

    申请日:2021-12-10

    Abstract: Systems, apparatuses, and methods for routing interrupts on a coherency probe network are disclosed. A computing system includes a plurality of processing nodes, a coherency probe network, and one or more control units. The coherency probe network carries coherency probe messages between coherent agents. Interrupts that are detected by a control unit are converted into messages that are compatible with coherency probe messages and then routed to a target destination via the coherency probe network. Interrupts are generated with a first encoding while coherency probe messages have a second encoding. Cache subsystems determine whether a message received via the coherency probe network is an interrupt message or a coherency probe message based on an encoding embedded in the received message. Interrupt messages are routed to interrupt controller(s) while coherency probe messages are processed in accordance with a coherence probe action field embedded in the message.

    PREFETCH DISABLE OF MEMORY REQUESTS TARGETING DATA LACKING LOCALITY

    公开(公告)号:US20220100664A1

    公开(公告)日:2022-03-31

    申请号:US17132769

    申请日:2020-12-23

    Abstract: A system and method for efficiently processing memory requests are described. A processing unit includes at least a processor core, a cache, and a non-cache storage buffer capable of storing data prevented from being stored in the cache. While processing a memory request targeting the non-cache storage buffer, the processor core inspects a flag stored in a tag of the memory request. The processor core prevents data prefetching into one or more of the non-cache storage buffer and the cache based on determining the flag specifies preventing data prefetching into one or more of the non-cache storage buffer and the cache using the target address of the memory request during processing of this instance of the memory request. While processing a prefetch hint instruction, the processor core determines from the tag whether to prevent prefetching.

    Allocation of Memory Access Bandwidth to Clients an an Electronic Device

    公开(公告)号:US20220100567A1

    公开(公告)日:2022-03-31

    申请号:US17120215

    申请日:2020-12-13

    Inventor: Guhan Krishnan

    Abstract: An electronic device includes a memory; a plurality of clients; at least one arbiter circuit; and a management circuit. A given client of the plurality of clients communicates a request to the management circuit requesting an allocation of memory access bandwidth for accesses of the memory by the given client. The management circuit then determines, based on the request, a set of memory access bandwidths including a respective memory access bandwidth for each of the given client and other clients of the plurality of clients that are allocated memory access bandwidth. The management circuit next configures the at least one arbiter circuit to use respective memory access bandwidths from the set of memory access bandwidths for the given client and the other clients for subsequent accesses of the memory.

    OPERATING VOLTAGE ADJUSTMENT FOR AGING CIRCUITS

    公开(公告)号:US20220100249A1

    公开(公告)日:2022-03-31

    申请号:US17127681

    申请日:2020-12-18

    Abstract: A system and method for updating power supply voltages due to variations from aging are described. A functional unit includes a power supply monitor capable of measuring power supply variations in a region of the functional unit. An age counter measures an age of the functional unit. A control unit notifies the power supply monitor to measure an operating voltage reference. When the control unit receives a measured operating voltage reference, the control unit determines an updated age of the region different from the current age based on the measured operating voltage reference. The control unit updates the age counter with the corresponding age, which is younger than the previous age in some cases due to the region not experiencing predicted stress and aging. The control unit is capable of determining a voltage adjustment for the operating voltage reference based on an age indicated by the age counter.

    MEMORY ACCESS RESPONSE MERGING IN A MEMORY HIERARCHY

    公开(公告)号:US20220091980A1

    公开(公告)日:2022-03-24

    申请号:US17031706

    申请日:2020-09-24

    Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.

    FINE-GRAINED CONDITIONAL DISPATCHING

    公开(公告)号:US20220091880A1

    公开(公告)日:2022-03-24

    申请号:US17031424

    申请日:2020-09-24

    Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

Patent Agency Ranking