Matrix transpose hardware acceleration

    公开(公告)号:US12125124B1

    公开(公告)日:2024-10-22

    申请号:US18118251

    申请日:2023-03-07

    发明人: Kun Xu Ron Diamant

    摘要: In one example, an apparatus comprises: a buffer memory; and a memory access circuit configured to: fetch, from a first memory, a set of first groups of data elements of a first matrix, each first group of data elements being stored at consecutive memory addresses at the first memory; based on a first configuration, store the set of first groups of data elements at consecutive memory addresses or at non-consecutive memory addresses at the buffer memory; based on a second configuration that defines a memory address offset, fetch a set of second groups of the data elements from the buffer memory, each second group of the data elements being stored at consecutive memory addresses of the buffer memory, each second group being separated by the memory address offset in the buffer memory; and store each fetched second group at consecutive addresses of a destination memory to form a second matrix.

    Method and apparatus for controlling cache line storage in cache memory

    公开(公告)号:US12124373B2

    公开(公告)日:2024-10-22

    申请号:US18185058

    申请日:2023-03-16

    发明人: David A. Roberts

    摘要: A method and apparatus physically partitions clean and dirty cache lines into separate memory partitions, such as one or more banks, so that during low power operation, a cache memory controller reduces power consumption of the cache memory containing the clean only data. The cache memory controller controls refresh operation so that data refresh does not occur for clean data only banks or the refresh rate is reduced for clean data only banks. Partitions that store dirty data can also store clean data, however other partitions are designated for storing only clean data so that the partitions can have their refresh rate reduced or refresh stopped for periods of time. When multiple DRAM dies or packages are employed, the partition can occur on a die or package level as opposed to a bank level within a die.

    INFORMATION PROCESSING SYSTEM
    6.
    发明公开

    公开(公告)号:US20240345774A1

    公开(公告)日:2024-10-17

    申请号:US18630146

    申请日:2024-04-09

    IPC分类号: G06F3/06 G06F12/0862

    摘要: According to one embodiment, an information processing system includes a processor, a first memory device, and a second memory device including a nonvolatile memory. The nonvolatile memory is accessed by a load/store command. Before issuing a load command to load data stored in the nonvolatile memory, the processor is configured to write a request to instruct prefetching the data to the first memory device. The second memory device includes a controller configured to prefetch the data stored in the nonvolatile memory, based on the request written to the first memory device.

    Prefetch of microservices for incoming requests

    公开(公告)号:US12117936B1

    公开(公告)日:2024-10-15

    申请号:US18190176

    申请日:2023-03-27

    IPC分类号: G06F12/0862

    CPC分类号: G06F12/0862 G06F2212/1024

    摘要: Prefetch of microservices for incoming requests. The method determines for an incoming request a Service Level Objective (SLO) requirement for latency of a request type of the incoming request. The method generates a set of possible microservice sequences for the request including a probability of occurrence of each of the possible microservice sequences and determines a set of prefetch permutations for the set of possible microservice sequences. A latency score is calculated for each prefetch permutation and any prefetch permutations that do not meet the SLO requirement for latency of the request type are eliminated. An optimal prefetch permutation of the remaining prefetch permutations is selected by considering a total cost of the prefetch permutation based on a cost of running each microservice in the set of sequences.

    Processor instructions for data compression and decompression

    公开(公告)号:US12106104B2

    公开(公告)日:2024-10-01

    申请号:US17133328

    申请日:2020-12-23

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F12/0862 H03M7/30

    摘要: A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.