GATHER USING INDEX ARRAY AND FINITE STATE MACHINE
    4.
    发明申请
    GATHER USING INDEX ARRAY AND FINITE STATE MACHINE 有权
    使用索引阵列和有限状态机

    公开(公告)号:US20130326160A1

    公开(公告)日:2013-12-05

    申请号:US13487184

    申请日:2012-06-02

    IPC分类号: G06F12/00

    摘要: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

    摘要翻译: 公开了使用索引阵列和有限状态机进行散射/收集操作的方法和装置。 设备的实施例可以包括:解码逻辑以解码分散/收集指令并生成一组微操作,以及索引阵列以保存一组索引和相应的一组掩码元素。 有限状态机有助于收集操作。 地址生成逻辑从针对具有第一值的对应掩模元素中的至少每一个的索引集合的索引生成地址。 如果mask元素具有第一个值,则访问地址以加载相应的数据元素。 根据相应的注册位置的索引,将数据元素写入到目的地向量寄存器的寄存器位置。 响应于其相应负载的完成,对应的屏蔽元件的值从第一值改变为第二值。

    Gather using index array and finite state machine
    5.
    发明授权
    Gather using index array and finite state machine 有权
    收集使用索引数组和有限状态机

    公开(公告)号:US08972697B2

    公开(公告)日:2015-03-03

    申请号:US13487184

    申请日:2012-06-02

    IPC分类号: G06F12/02

    摘要: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

    摘要翻译: 公开了使用索引阵列和有限状态机进行散射/收集操作的方法和装置。 设备的实施例可以包括:解码逻辑以解码分散/收集指令并生成一组微操作,以及索引阵列以保存一组索引和相应的一组掩码元素。 有限状态机有助于收集操作。 地址生成逻辑从针对具有第一值的对应掩模元素中的至少每一个的索引集合的索引生成地址。 如果mask元素具有第一个值,则访问地址以加载相应的数据元素。 根据相应的注册位置的索引,将数据元素写入到目的地向量寄存器的寄存器位置。 响应于其相应负载的完成,对应的屏蔽元件的值从第一值改变为第二值。

    Snoop filter having centralized translation circuitry and shadow tag array
    6.
    发明授权
    Snoop filter having centralized translation circuitry and shadow tag array 有权
    具有集中翻译电路和阴影标签阵列的窥探滤波器

    公开(公告)号:US09268697B2

    公开(公告)日:2016-02-23

    申请号:US13730956

    申请日:2012-12-29

    IPC分类号: G06F12/08 G06F12/10

    摘要: A processor is described that includes a plurality of processing cores. The processor includes an interconnection network coupled to each of said processing cores. The processor includes snoop filter logic circuitry coupled to the interconnection network and associated with coherence plane logic circuitry of the processor. The snoop filter logic circuitry contains circuitry to hold information that identifies not only which of the processing cores are caching specific cache lines that are cached by the processing cores, but also, where in respective caches of the processing cores the cache lines are cached.

    摘要翻译: 描述了包括多个处理核的处理器。 处理器包括耦合到每个所述处理核心的互连网络。 处理器包括连接到互连网络并与处理器的相干平面逻辑电路相关联的窥探滤波器逻辑电路。 监听滤波器逻辑电路包含用于保存信息的电路,该信息不仅识别哪个处理核心缓存由处理核心高速缓存的特定高速缓存线,而且在处理核心的高速缓存中缓存高速缓存行被缓存。

    METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING
    7.
    发明申请
    METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING 有权
    使用商店预购切割高级商店的方法和装置

    公开(公告)号:US20140223105A1

    公开(公告)日:2014-08-07

    申请号:US13993508

    申请日:2011-12-30

    IPC分类号: G06F9/38 G06F12/08

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.

    摘要翻译: 根据本文公开的实施例,提供了使用商店预取来切割高级商店延迟的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括集成电路或乱序处理器装置,其处理不一致的指令并对高速缓存执行按顺序的要求。 这样的集成电路或不按顺序的处理器装置还包括用于接收存储指令的装置; 用于执行所述存储指令的地址生成和转换以计算由所述存储指令访问的存储器的物理地址的装置; 以及用于在存储指令退出之前基于所述存储指令和所计算的物理地址来执行用于高速缓存行的预取的装置。

    Methods and apparatus for efficient communication between caches in hierarchical caching design
    8.
    发明授权
    Methods and apparatus for efficient communication between caches in hierarchical caching design 有权
    用于层次化缓存设计中高速缓存之间高效通信的方法和设备

    公开(公告)号:US09411728B2

    公开(公告)日:2016-08-09

    申请号:US13994399

    申请日:2011-12-23

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus.

    摘要翻译: 根据本文公开的实施例,提供了用于在分级缓存设计中实现高速缓存之间的有效通信的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括具有数据总线的集成电路; 与数据总线可通信地接口的低级缓存; 与数据总线可通信地接口的更高级别的缓存; 一个或多个数据缓冲器和一个或多个无数据缓冲器。 这种实施例中的数据缓冲器与数据总线可通信地接口,并且一个或多个数据缓冲器中的每一个具有缓冲存储器以缓冲全高速缓存线,一个或多个控制位以指示各个数据缓冲器的状态,以及 与完整缓存行相关联的地址。 在这种实施例中的无数据缓冲器不能存储完整的高速缓存行并且具有一个或多个控制位以指示相应无数据缓冲器的状态和与相应无数据缓冲器相关联的高速缓存间传输线的地址。 在这样的实施例中,高速缓存间传输逻辑是经由数据总线从高级缓存请求高速缓存间传输线,并且进一步将数据总线上的缓存间传输线写入低级缓存。

    METHODS AND APPARATUS FOR EFFICIENT COMMUNICATION BETWEEN CACHES IN HIERARCHICAL CACHING DESIGN
    9.
    发明申请
    METHODS AND APPARATUS FOR EFFICIENT COMMUNICATION BETWEEN CACHES IN HIERARCHICAL CACHING DESIGN 有权
    用于分层缓存设计中的高速缓存之间的有效通信的方法和设备

    公开(公告)号:US20130326145A1

    公开(公告)日:2013-12-05

    申请号:US13994399

    申请日:2011-12-23

    IPC分类号: G06F12/08

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus.

    摘要翻译: 根据本文公开的实施例,提供了用于在分级缓存设计中实现高速缓存之间的有效通信的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括具有数据总线的集成电路; 与数据总线可通信地接口的低级缓存; 与数据总线可通信地接口的更高级别的缓存; 一个或多个数据缓冲器和一个或多个无数据缓冲器。 这种实施例中的数据缓冲器与数据总线可通信地接口,并且一个或多个数据缓冲器中的每一个具有缓冲存储器以缓冲全高速缓存线,一个或多个控制位以指示各个数据缓冲器的状态,以及 与完整缓存行相关联的地址。 在这种实施例中的无数据缓冲器不能存储完整的高速缓存行并且具有一个或多个控制位以指示相应无数据缓冲器的状态和与相应无数据缓冲器相关联的高速缓存间传输线的地址。 在这样的实施例中,高速缓存间传输逻辑是经由数据总线从高级缓存请求高速缓存间传输线,并且进一步将数据总线上的缓存间传输线写入低级缓存。

    Method and apparatus for cutting senior store latency using store prefetching
    10.
    发明授权
    Method and apparatus for cutting senior store latency using store prefetching 有权
    使用存储预取来切割高级存储延迟的方法和装置

    公开(公告)号:US09405545B2

    公开(公告)日:2016-08-02

    申请号:US13993508

    申请日:2011-12-30

    IPC分类号: G06F12/08 G06F9/38

    摘要: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.

    摘要翻译: 根据本文公开的实施例,提供了使用商店预取来切割高级商店延迟的方法,系统,机制,技术和装置。 例如,在一个实施例中,这种装置可以包括集成电路或乱序处理器装置,其处理不一致的指令并对高速缓存执行按顺序的要求。 这样的集成电路或不按顺序的处理器装置还包括用于接收存储指令的装置; 用于执行所述存储指令的地址生成和转换以计算由所述存储指令访问的存储器的物理地址的装置; 以及用于在存储指令退出之前基于所述存储指令和所计算的物理地址来执行用于高速缓存行的预取的装置。