Range-based cache flushing
    1.
    发明授权

    公开(公告)号:US12197329B2

    公开(公告)日:2025-01-14

    申请号:US18078298

    申请日:2022-12-09

    Abstract: Systems and methods of cache flushing include receiving, from a software application, a first cache flush request to perform a range-based cache flush of a contiguous virtual address range within a virtual memory that maps to a physical memory. A single cache walk is triggered via a second cache flush request to a cache. The single cache walk performs the range-based cache flush for the contiguous physical address range from a beginning address of the contiguous physical address range to an ending address of the contiguous physical address range in response to the first cache flush request.

    CACHE COHERENCE FOR PROCESSING IN MEMORY
    5.
    发明申请

    公开(公告)号:US20170344479A1

    公开(公告)日:2017-11-30

    申请号:US15169118

    申请日:2016-05-31

    Abstract: A cache coherence bridge protocol provides an interface between a cache coherence protocol of a host processor and a cache coherence protocol of a processor-in-memory, thereby decoupling coherence mechanisms of the host processor and the processor-in-memory. The cache coherence bridge protocol requires limited change to existing host processor cache coherence protocols. The cache coherence bridge protocol may be used to facilitate interoperability between host processors and processor-in-memory devices designed by different vendors and both the host processors and processor-in-memory devices may implement coherence techniques among computing units within each processor. The cache coherence bridge protocol may support different granularity of cache coherence permissions than those used by cache coherence protocols of a host processor and/or a processor-in-memory. The cache coherence bridge protocol uses a shadow directory that maintains status information indicating an aggregate view of copies of data cached in a system external to a processor-in-memory containing that data.

    INFRASTRUCTURE TO SUPPORT ACCELERATOR COMPUTATION MODELS FOR ACTIVE STORAGE
    6.
    发明申请
    INFRASTRUCTURE TO SUPPORT ACCELERATOR COMPUTATION MODELS FOR ACTIVE STORAGE 审中-公开
    支持主动存储的加速器计算模型的基础设施

    公开(公告)号:US20160335064A1

    公开(公告)日:2016-11-17

    申请号:US14709915

    申请日:2015-05-12

    CPC classification number: G06F8/447 G06F8/4434

    Abstract: A method, a system, and a non-transitory computer readable medium for generating application code to be executed on an active storage device are presented. The parts of an application that can be executed on the active storage device are determined. The parts of the application that will not be executed on the active storage device are converted into code to be executed on a host device. The parts of the application that will be executed on the active storage device are converted into code of an instruction set architecture of a processor in the active storage device.

    Abstract translation: 呈现用于生成要在活动存储设备上执行的应用代码的方法,系统和非暂时性计算机可读介质。 可以在活动存储设备上执行的应用程序的部分被确定。 将不会在活动存储设备上执行的应用程序的部分被转换为要在主机设备上执行的代码。 将在活动存储设备上执行的应用的部分被转换为主动存储设备中的处理器的指令集架构的代码。

    RANGE-BASED CACHE FLUSHING
    7.
    发明公开

    公开(公告)号:US20240193083A1

    公开(公告)日:2024-06-13

    申请号:US18078298

    申请日:2022-12-09

    CPC classification number: G06F12/0804 G06F2212/1024

    Abstract: Systems and methods of cache flushing include receiving, from a software application, a first cache flush request to perform a range-based cache flush of a contiguous virtual address range within a virtual memory that maps to a physical memory. A single cache walk is triggered via a second cache flush request to a cache. The single cache walk performs the range-based cache flush for the contiguous physical address range from a beginning address of the contiguous physical address range to an ending address of the contiguous physical address range in response to the first cache flush request.

    Compressing Micro-Operations in Scheduler Entries in a Processor

    公开(公告)号:US20220100501A1

    公开(公告)日:2022-03-31

    申请号:US17033883

    申请日:2020-09-27

    Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.

    Coherency directory entry allocation based on eviction costs

    公开(公告)号:US10705958B2

    公开(公告)日:2020-07-07

    申请号:US16108696

    申请日:2018-08-22

    Abstract: A processor partitions a coherency directory into different regions for different processor cores and manages the number of entries allocated to each region based at least in part on monitored recall costs indicating expected resource costs for reallocating entries. Examples of monitored recall costs include a number of cache evictions associated with entry reallocation, the hit rate of each region of the coherency directory, and the like, or a combination thereof. By managing the entries allocated to each region based on the monitored recall costs, the processor ensures that processor cores associated with denser memory access patterns (that is, memory access patterns that more frequently access cache lines associated with the same memory pages) are assigned more entries of the coherency directory.

    DECOMPOSING MATRICES FOR PROCESSING AT A PROCESSOR-IN-MEMORY

    公开(公告)号:US20230102296A1

    公开(公告)日:2023-03-30

    申请号:US17490037

    申请日:2021-09-30

    Abstract: A processing unit decomposes a matrix for partial processing at a processor-in-memory (PIM) device. The processing unit receives a matrix to be used as an operand in an arithmetic operation (e.g., a matrix multiplication operation). In response, the processing unit decomposes the matrix into two component matrices: a sparse component matrix and a dense component matrix. The processing unit itself performs the arithmetic operation with the dense component matrix, but sends the sparse component matrix to the PIM device for execution of the arithmetic operation. The processing unit thereby offloads at least some of the processing overhead to the PIM device, improving overall efficiency of the processing system.

Patent Agency Ranking