Memory load to load fusing
    1.
    发明授权

    公开(公告)号:US10372452B2

    公开(公告)日:2019-08-06

    申请号:US15615811

    申请日:2017-06-06

    Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

    Efficient fill-buffer data forwarding supporting high frequencies
    3.
    发明授权
    Efficient fill-buffer data forwarding supporting high frequencies 有权
    高效的填充缓冲区数据转发支持高频率

    公开(公告)号:US09418018B2

    公开(公告)日:2016-08-16

    申请号:US14337211

    申请日:2014-07-21

    Abstract: A Fill Buffer (FB) based data forwarding scheme that stores a combination of Virtual Address (VA), TLB (Translation Look-aside Buffer) entry# or an indication of a location of a Page Table Entry (PTE) in the TLB, and a TLB page size information in the FB and uses these values to expedite FB forwarding. Load (Ld) operations send their non-translated VA for an early comparison against the VA entries in the FB, and are then further qualified with the TLB entry# to determine a “hit.” This hit determination is fast and enables FB forwarding at higher frequencies without waiting for a comparison of Physical Addresses (PA) to conclude in the FB. A safety mechanism may detect a false hit in the FB and generate a late load cancel indication to cancel the earlier-started FB forwarding by ignoring the data obtained as a result of the Ld execution. The Ld is then re-executed later and tries to complete successfully with the correct data.

    Abstract translation: 一种基于填充缓冲器(FB)的数据转发方案,其存储虚拟地址(VA),TLB(翻译后备缓冲区)条目#或页面表项(PTE)在TLB中的位置的指示的组合,以及 FB中的TLB页面大小信息,并使用这些值来加速FB转发。 加载(Ld)操作发送他们的非翻译的VA,以便与FB中的VA条目进行早期比较,然后进一步通过TLB条目#进行限定,以确定“命中”。该命中确定速度很快,可以使FB转发 更高的频率,而不等待物理地址(PA)的比较结束于FB。 安全机制可以检测到FB中的错误命中,并产生一个晚期负载取消指示,以通过忽略由于执行Ld而获得的数据来取消较早启动的FB转发。 然后,Ld稍后重新执行,并尝试使用正确的数据成功完成。

    High-frequency and low-power L1 cache and associated access technique

    公开(公告)号:US11048637B2

    公开(公告)日:2021-06-29

    申请号:US16547557

    申请日:2019-08-21

    Inventor: Karthik Sundaram

    Abstract: A high-frequency and low-power L1 cache and associated access technique. The method may include inspecting a virtual address of an L1 data cache load instruction, and indexing into a row and a column of a way predictor table using metadata and a virtual address associated with the load instruction. The method may include matching information stored at the row and the column of the way predictor table to a location of a cache line. The method may include predicting the location of the cache line within the L1 data cache based on the information match. A hierarchy of way predictor tables may be used, with higher level way predictor tables refreshing smaller lower level way predictor tables. The way predictor tables may be trained to make better predictions over time. Only selected circuit macros need to be enabled based on the predictions, thereby saving power.

    Memory load and arithmetic load unit (ALU) fusing

    公开(公告)号:US10275217B2

    公开(公告)日:2019-04-30

    申请号:US15612963

    申请日:2017-06-02

    Abstract: According to one general aspect, a load unit may include a load circuit configured to load at least one piece of data from a memory. The load unit may include an alignment circuit configured to align the data to generate an aligned data. The load unit may also include a mathematical operation execution circuit configured to generate a resultant of a predetermined mathematical operation with the at least one piece of data as an operand. Wherein the load unit is configured to, if an active instruction is associated with the predetermined mathematical operation, bypass the alignment circuit and input the piece of data directly to the mathematical operation execution circuit.

    Address re-ordering mechanism for efficient pre-fetch training in an out-of-order processor
    6.
    发明授权
    Address re-ordering mechanism for efficient pre-fetch training in an out-of-order processor 有权
    解决无序处理器中高效预取训练的重新排序机制

    公开(公告)号:US09542323B2

    公开(公告)日:2017-01-10

    申请号:US14498878

    申请日:2014-09-26

    Abstract: A computing system includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to filter the program instruction when the program instruction is a hit in a cache-line in a prefetch filter. The computer system further includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to: allocate a tag in a tag module for the program instruction in a program order, allocate a virtual address in a virtual address module for the program instruction in an out-of-order relative to the program order, and insert a pointer associated with the tag to link the tag to the virtual address.

    Abstract translation: 计算系统包括:指令调度模块,被配置为接收程序指令; 耦合到指令调度模块的地址重排序模块,被配置为当所述程序指令是预取过滤器中的高速缓存行中的命中时对所述程序指令进行过滤。 计算机系统还包括:指令调度模块,被配置为接收程序指令; 一个地址重排序模块,耦合到指令调度模块,被配置为:以程序顺序在程序指令的标签模块中分配标签,在虚拟地址模块中为程序指令分配虚拟地址, 相对于程序顺序的顺序,并插入与标签相关联的指针,以将标签链接到虚拟地址。

    COMPUTING SYSTEM WITH STRIDE PREFETCH MECHANISM AND METHOD OF OPERATION THEREOF
    7.
    发明申请
    COMPUTING SYSTEM WITH STRIDE PREFETCH MECHANISM AND METHOD OF OPERATION THEREOF 审中-公开
    具有前瞻性机制的计算机系统及其操作方法

    公开(公告)号:US20160054997A1

    公开(公告)日:2016-02-25

    申请号:US14832547

    申请日:2015-08-21

    Abstract: A computing system includes: an instruction dispatch module configured to receive an address stream; a prefetch module, coupled to the instruction dispatch module, configured to: train to concurrently detect a single-stride pattern or a multi-stride pattern from the address stream, speculatively fetch a program data based on the single-stride pattern or the multi-stride pattern, and continue to train for the single-stride pattern with a larger value for a stride count or for the multi-stride pattern.

    Abstract translation: 计算系统包括:指令调度模块,被配置为接收地址流; 耦合到所述指令调度模块的预取模块,被配置为:训练从所述地址流同时检测单步模式或多步式模式,基于所述单步模式或所述多步式模式推测性地获取程序数据, 并且继续训练具有更大的步幅计数或多步式模式的单步式模式。

    Memory load to load fusing
    8.
    发明授权

    公开(公告)号:US10956155B2

    公开(公告)日:2021-03-23

    申请号:US16421463

    申请日:2019-05-23

    Abstract: A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.

    Address re-ordering mechanism for efficient pre-fetch training in an out-of order processor

    公开(公告)号:US10031851B2

    公开(公告)日:2018-07-24

    申请号:US15401515

    申请日:2017-01-09

    Abstract: A computing system includes: an instruction dispatch module module configured to receive a program instruction; and an address reordering module, coupled to the instruction dispatch module, configured to filter the program instruction when the program instruction is a hit in a cache-line in a prefetch filter. The computer system further includes: an instruction dispatch module configured to receive a program instruction; an address reordering module, coupled to the instruction dispatch module, configured to: allocate a tag in a tag module for the program instruction in a program order, allocate a virtual address in a virtual address module for the program instruction and out-of-order relative to the program order, and insert a pointer associated with the tag to link the tag to the virtual address.

    Pre-fetch chaining
    10.
    发明授权
    Pre-fetch chaining 有权
    预取链接

    公开(公告)号:US09569361B2

    公开(公告)日:2017-02-14

    申请号:US14325343

    申请日:2014-07-07

    CPC classification number: G06F12/0862 G06F12/10 G06F2212/6022

    Abstract: According to one general aspect, an apparatus may include a cache pre-fetcher, and a pre-fetch scheduler. The cache pre-fetcher may be configured to predict, based at least in part upon a virtual address, data to be retrieved from a memory system. The pre-fetch scheduler may be configured to convert the virtual address of the data to a physical address of the data, and request the data from one of a plurality of levels of the memory system. The memory system may include a plurality of levels, each level of the memory system configured to store data.

    Abstract translation: 根据一个一般方面,设备可以包括高速缓存预取器和预取调度器。 高速缓存预取器可以被配置为至少部分地基于虚拟地址预测要从存储器系统检索的数据。 预取调度器可以被配置为将数据的虚拟地址转换为数据的物理地址,并且从存储器系统的多个级别之一请求数据。 存储器系统可以包括多个级别,存储器系统的每个级别被配置为存储数据。

Patent Agency Ranking