APPARATUS, METHODS, AND SYSTEMS FOR INTEGRATED PERFORMANCE MONITORING IN A CONFIGURABLE SPATIAL ACCELERATOR

    公开(公告)号:US20190303263A1

    公开(公告)日:2019-10-03

    申请号:US15941888

    申请日:2018-03-30

    IPC分类号: G06F11/34 G06F11/30

    摘要: Systems, methods, and apparatuses relating to integrated performance monitoring in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first performance monitoring circuit coupled to a first proper subset of processing elements by a network to receive at least one monitoring value from each of the first plurality of the processing elements, generate a first aggregated monitoring value based on the at least one monitoring value from each of the first plurality of the processing elements, and send the first aggregated monitoring value to a performance manager circuit on a different network when a first threshold value is exceeded by the first aggregated monitoring value; and the performance manager circuit is to perform an action based on the first aggregated monitoring value.

    Domain state
    2.
    发明授权
    Domain state 有权
    域状态

    公开(公告)号:US09588889B2

    公开(公告)日:2017-03-07

    申请号:US13995991

    申请日:2011-12-29

    IPC分类号: G06F12/08 G06F13/00

    摘要: Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.

    摘要翻译: 通过读/写与缓存标签目录中的标签条目相关联的域状态字段来有效地维持高速缓存一致性的方法和装置。 可以将值分配给缓存标签目录中的标签条目的域状态字段。 缓存标签目录可能属于高速缓存标签目录的层次结构。 每个标签条目可以与来自属于第一域的高速缓存行相关联。 第一个域可能包含多个缓存。 域状态字段的值可以指示其相关联的高速缓存行是否可以被读取或改变。

    Short circuit of probes in a chain
    3.
    发明授权
    Short circuit of probes in a chain 有权
    探针在链中短路

    公开(公告)号:US09201792B2

    公开(公告)日:2015-12-01

    申请号:US13996012

    申请日:2011-12-29

    IPC分类号: G06F12/08

    CPC分类号: G06F12/084 G06F12/082

    摘要: A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.

    摘要翻译: 多核处理装置可以提供高速缓存探针和数据检索方法。 该方法可以包括将请求者的存储器请求发送到记录保存结构。 存储器请求可以具有存储请求的数据的存储器的存储器地址。 该方法还可以包括确定存储器地址的本地最后访问器可以具有与存储器一起的所请求数据的副本。 本地最后一个访问者可能在请求者所属的本地域内。 该方法还可以包括向本地最后一个访问器发送高速缓存探测器,并且从本地最后一个访问器检索所请求的数据的最新值到请求者。

    Probe speculative address file
    4.
    发明授权
    Probe speculative address file 失效
    探测推测地址文件

    公开(公告)号:US08438335B2

    公开(公告)日:2013-05-07

    申请号:US12892476

    申请日:2010-09-28

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0815 G06F2212/507

    摘要: An apparatus to resolve cache coherency is presented. In one embodiment, the apparatus includes a microprocessor comprising one or more processing cores. The apparatus also includes a probe speculative address file unit, coupled to a cache memory, comprising a plurality of entries. Each entry includes a timer and a tag associated with a memory line. The apparatus further includes control logic to determine whether to service an incoming probe based at least in part on a timer value.

    摘要翻译: 提出了一种解决高速缓存一致性的设备。 在一个实施例中,该装置包括具有一个或多个处理核心的微处理器。 该装置还包括耦合到高速缓冲存储器的探测推测地址文件单元,包括多个条目。 每个条目包括定时器和与存储器线相关联的标签。 该装置还包括至少部分地基于定时器值来确定是否对入站探测器进行服务的控制逻辑。

    Cache spill management techniques using cache spill prediction
    5.
    发明授权
    Cache spill management techniques using cache spill prediction 失效
    缓存溢出管理技术使用缓存溢出预测

    公开(公告)号:US08407421B2

    公开(公告)日:2013-03-26

    申请号:US12639214

    申请日:2009-12-16

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0806 G06F12/12

    摘要: An apparatus and method is described herein for intelligently spilling cache lines. Usefulness of cache lines previously spilled from a source cache is learned, such that later evictions of useful cache lines from a source cache are intelligently selected for spill. Furthermore, another learning mechanism—cache spill prediction—may be implemented separately or in conjunction with usefulness prediction. The cache spill prediction is capable of learning the effectiveness of remote caches at holding spilled cache lines for the source cache. As a result, cache lines are capable of being intelligently selected for spill and intelligently distributed among remote caches based on the effectiveness of each remote cache in holding spilled cache lines for the source cache.

    摘要翻译: 这里描述了用于智能地溢出高速缓存行的装置和方法。 了解先前从源缓存溢出的高速缓存行的有用性,从而智能地选择来自源缓存的随后驱逐的溢出。 此外,另一种学习机制 - 缓存溢出预测 - 可以单独实施或结合有用性预测来实现。 高速缓存溢出预测能够学习在为源缓存保留溢出的高速缓存行时远程高速缓存的有效性。 因此,基于每个远程高速缓存在保存用于源高速缓存的溢出高速缓存行的有效性的情况下,高速缓存行能够被智能地选择为溢出并且智能地分布在远程高速缓存中。

    METHOD AND APPARATUS FOR OPTIMIZING THE USAGE OF CACHE MEMORIES
    6.
    发明申请
    METHOD AND APPARATUS FOR OPTIMIZING THE USAGE OF CACHE MEMORIES 有权
    优化缓存使用的方法和设备

    公开(公告)号:US20120159077A1

    公开(公告)日:2012-06-21

    申请号:US12974907

    申请日:2010-12-21

    IPC分类号: G06F12/08

    摘要: A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.

    摘要翻译: 一种减少对主存储器的缓存数据的不必要的回写并优化高速缓存存储器标签目录的使用的方法和装置。 在本发明的一个实施例中,通过消除具有已经达到其使用寿命的信息的高速缓冲存储器线的写回,可以节省处理器的功耗。 在本发明的一个实施例中,当需要处理单元来清除一个或多个高速缓存存储器线时,它使用写入零命令来清除一个或多个高速缓存存储器线。 处理单元不执行写入操作以将数据值0移动或传递给一个或多个高速缓存存储器线。 通过这样做,它降低了处理单元的功耗。

    CACHE SPILL MANAGEMENT TECHNIQUES
    7.
    发明申请
    CACHE SPILL MANAGEMENT TECHNIQUES 失效
    缓存溢出管理技术

    公开(公告)号:US20110145501A1

    公开(公告)日:2011-06-16

    申请号:US12639214

    申请日:2009-12-16

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0806 G06F12/12

    摘要: An apparatus and method is described herein for intelligently spilling cache lines. Usefulness of cache lines previously spilled from a source cache is learned, such that later evictions of useful cache lines from a source cache are intelligently selected for spill. Furthermore, another learning mechanism—cache spill prediction—may be implemented separately or in conjunction with usefulness prediction. The cache spill prediction is capable of learning the effectiveness of remote caches at holding spilled cache lines for the source cache. As a result, cache lines are capable of being intelligently selected for spill and intelligently distributed among remote caches based on the effectiveness of each remote cache in holding spilled cache lines for the source cache.

    摘要翻译: 这里描述了用于智能地溢出高速缓存行的装置和方法。 了解先前从源缓存溢出的高速缓存行的有用性,从而智能地选择来自源缓存的随后驱逐的溢出。 此外,另一种学习机制 - 缓存溢出预测 - 可以单独实施或结合有用性预测来实现。 高速缓存溢出预测能够学习在为源缓存保留溢出的高速缓存行时远程高速缓存的有效性。 因此,基于每个远程高速缓存在保存用于源高速缓存的溢出高速缓存行的有效性的情况下,高速缓存行能够被智能地选择为溢出并且智能地分布在远程高速缓存中。

    Systems and methods for executing across at least one memory barrier employing speculative fills
    8.
    发明授权
    Systems and methods for executing across at least one memory barrier employing speculative fills 有权
    通过使用投机填充的至少一个记忆障碍执行的系统和方法

    公开(公告)号:US07360069B2

    公开(公告)日:2008-04-15

    申请号:US10756639

    申请日:2004-01-13

    IPC分类号: G06F9/00

    摘要: Multi-processor systems and methods are provided. One embodiment relates to a multi-processor system that may comprise a processor having a processor pipeline that executes program instructions across at least one memory barrier with data from speculative data fills that are provided in response to source requests, and a log that retains executed load instruction entries associated with executed program instruction. The executed load instruction entries may be retired if a cache line associated with data of the speculative data fill has not been invalidated in an epoch that is different from the epoch in which the executed load instruction is executed.

    摘要翻译: 提供多处理器系统和方法。 一个实施例涉及一种多处理器系统,其可以包括具有处理器流水线的处理器,处理器流水线通过至少一个存储器障碍执行程序指令,其中数据来自响应于源请求而提供的推测数据填充,以及保留执行负载的日志 与执行的程序指令相关联的指令条目。 如果在与执行的执行加载指令的历元不同的时期中,与推测数据填充的数据相关联的高速缓存行没有被无效,那么执行的加载指令条目可能会被停止。

    Mechanism for selectively imposing interference order between page-table fetches and corresponding data fetches
    9.
    发明授权
    Mechanism for selectively imposing interference order between page-table fetches and corresponding data fetches 失效
    选择性地强制页表提取之间的干扰顺序和相应数据提取的机制

    公开(公告)号:US06286090B1

    公开(公告)日:2001-09-04

    申请号:US09084621

    申请日:1998-05-26

    IPC分类号: G06F1200

    CPC分类号: G06F12/1054 G06F12/0813

    摘要: A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.

    摘要翻译: 一种技术选择性地将由多处理器系统的处理器发出的存储器参考操作之间的参考间排序施加于与由翻译缓冲器(TB)错过流程程影响的页表项(PTE)相关的页面内的地址。 TB错误流被用于检索包含在PTE中的信息,用于将虚拟地址映射到物理地址,并且随后允许在映射的物理地址处检索数据。 响应于存储器引用(读取)操作检索的PTE不会被加载到TB中,直到与该读取操作相关联的提交信号返回到处理器。 一旦返回了PTE和相关联的提交信号,处理器将PTE加载到TB中,以便它可以用于针对物理地址的数据的后续读取操作。

    High performance recoverable communication method and apparatus for
write-only networks
    10.
    发明授权
    High performance recoverable communication method and apparatus for write-only networks 失效
    用于只写网络的高性能可恢复通信方法和装置

    公开(公告)号:US6049889A

    公开(公告)日:2000-04-11

    申请号:US6115

    申请日:1998-01-13

    IPC分类号: H04L29/06 H04L29/14 G06F3/00

    CPC分类号: H04L29/06 H04L69/40

    摘要: A multi-node computer network includes a plurality of nodes coupled together via a data link. Each of the nodes includes a local memory, which further comprises a shared memory. Certain items of data that are to be shared by the nodes are stored in the shared portion of memory. Associated with each of the shared data items is a data structure. When a node sharing data with other nodes in the system seeks to modify the data, it transmits the modifications over the data link to the other nodes in the network. Each update is received in order by each node in the cluster. As part of the last transmission by the modifying node, an acknowledgement request is sent to the receiving nodes in the cluster. Each node that receives the acknowledgment request returns an acknowledgement to the sending node. The returned acknowledgement is written to the data structure associated with the shared data item. If there is an error during the transmission of the message, the receiving node does not transmit an acknowledgement, and the sending node is thereby notified that an error has occurred.

    摘要翻译: 多节点计算机网络包括通过数据链路耦合在一起的多个节点。 每个节点包括本地存储器,其还包括共享存储器。 要由节点共享的某些数据项存储在存储器的共享部分中。 与每个共享数据项相关联的是数据结构。 当与系统中的其他节点共享数据的节点寻求修改数据时,它将数据链路上的修改发送到网络中的其他节点。 群集中的每个节点按顺序接收每个更新。 作为修改节点的最后一次传输的一部分,向群集中的接收节点发送确认请求。 接收确认请求的每个节点向发送节点返回确认。 返回的确认被写入与共享数据项相关联的数据结构。 如果在消息的发送期间存在错误,则接收节点不发送确认,并且由此通知发送节点发生了错误。