APPARATUS AND METHOD FOR CONSIDERING SPATIAL LOCALITY IN LOADING DATA ELEMENTS FOR EXECUTION
    21.
    发明申请
    APPARATUS AND METHOD FOR CONSIDERING SPATIAL LOCALITY IN LOADING DATA ELEMENTS FOR EXECUTION 有权
    在加载数据元素执行时考虑空间局部性的装置和方法

    公开(公告)号:US20160170883A1

    公开(公告)日:2016-06-16

    申请号:US14567602

    申请日:2014-12-11

    Abstract: In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.

    Abstract translation: 在本发明的一个实施例中,处理器包括上级缓存和至少一个处理器核心。 所述至少一个处理器核心包括一个或多个寄存器和多个指令处理阶段:解码单元,用于解码需要输入多个数据元素的指令,其中所述多个数据元素中的每一个的大小小于 处理器的高速缓存行大小; 执行单元,用于将多个数据元素加载到处理器的一个或多个寄存器,而不将空间上与多个数据元素或多个数据元素空间相邻的数据元素加载到高级缓存中。

    Automatic identification and generation of non-temporal store and load operations in a dynamic optimization environment

    公开(公告)号:US10379827B2

    公开(公告)日:2019-08-13

    申请号:US15393931

    申请日:2016-12-29

    Inventor: Ruchira Sasanka

    Abstract: Techniques are disclosed to identify a frequently-executed region of code during runtime execution of the code, generate initial profiling code for the frequently-executed region of code, cause the initial profiling code to be executed for a minimum number of processing cycles of the computer, and identify replacement candidate store instruction(s) that store a value that is not read by the frequently-executed region of code during execution of the initial profiling code. Replacement candidate load instruction(s) may also be identified that load a value that is not stored or loaded by the frequently-executed region of code during execution of the initial profiling code. Optimized code for the frequently-executed region of code may be generated by replacing each of the replacement candidate store or load instructions(s) with a non-temporal store or load instruction. The optimized code may be executed instead of the frequently-executed region of code during subsequent runtime execution.

    OPPORTUNISTIC INCREASE OF WAYS IN MEMORY-SIDE CACHE

    公开(公告)号:US20190095335A1

    公开(公告)日:2019-03-28

    申请号:US16203847

    申请日:2018-11-29

    Inventor: Ruchira Sasanka

    Abstract: A processor includes a processor core and a cache controller coupled to the processor core. The cache controller is to allocate, for a memory, a plurality of cache entries in a cache, wherein the processor core is to: detect an amount of the memory installed in a computing system and, responsive to detecting less than a maximum allowable amount of the memory for the computing system, direct the cache controller to increase a number of ways of the cache in which to allocate the plurality of cache entries.

Patent Agency Ranking