Reducing number of rejected snoop requests by extending time to respond to snoop request
    21.
    发明授权
    Reducing number of rejected snoop requests by extending time to respond to snoop request 失效
    通过延长响应窥探请求的时间来减少被拒绝的窥探请求数

    公开(公告)号:US07340568B2

    公开(公告)日:2008-03-04

    申请号:US11056740

    申请日:2005-02-11

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0831

    摘要: A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. Information, such as the address, of the snoop request is stored in a queue of the stall/reorder unit. The stall/reorder unit forwards the snoop request to a selector which also receives a request from a processor. An arbitration mechanism selects either the snoop request or the request from the processor. If the snoop request is denied by the arbitration mechanism, information, e.g., address, about the snoop request may be maintained in the stall/reorder unit. The request may be later resent to the selector. This process may be repeated up to “n” clock cycles. By providing the snoop request additional opportunities (n clock cycles) to be accepted by the arbitration mechanism, fewer snoop requests may ultimately be denied.

    摘要翻译: 用于减少拒绝的窥探请求数量的缓存,系统和方法。 缓存中的“停止/重新排序单元”从互连中接收窥探请求。 窥探请求的诸如地址的信息被存储在失速/重新排序单元的队列中。 停止/重新排序单元将窥探请求转发到也从处理器接收请求的选择器。 仲裁机制选择来自处理器的窥探请求或请求。 如果侦听请求被仲裁机制拒绝,关于窥探请求的信息(例如地址)可以被保留在停止/重新排序单元中。 请求可能会稍后重新发送到选择器。 该过程可以重复直到“n”个时钟周期。 通过提供窥探请求仲裁机制接受的额外机会(n个时钟周期),最终可能会拒绝更少的侦听请求。

    Facilitating data coherency using in-memory tag bits and tag test instructions
    22.
    发明授权
    Facilitating data coherency using in-memory tag bits and tag test instructions 失效
    使用内存中标记位和标签测试指令促进数据一致性

    公开(公告)号:US08656121B2

    公开(公告)日:2014-02-18

    申请号:US13109254

    申请日:2011-05-17

    IPC分类号: G06F12/00

    摘要: Fine-grained detection of data modification of original data is provided by associating separate guard bits with granules of memory storing original data from which translated data has been obtained. The guard bits indicating whether the original data stored in the associated granule is protected for data coherency. The guard bits are set and cleared by special-purpose instructions. Responsive to attempting access to translated data obtained from the original data, the guard bit(s) associated with the original data is checked to determine whether the guard bit(s) fail to indicate coherency of the original data, and if so, discarding of the translated data is initiated to facilitate maintaining data coherency between the original data and the translated data.

    摘要翻译: 通过将单独的保护位与存储已经获得翻译数据的原始数据的存储器的颗粒相关联来提供对原始数据的数据修改的细粒度检测。 保护位指示存储在相关联的粒子中的原始数据是否受到数据一致性的保护。 保护位通过专用指令进行设置和清除。 响应于尝试访问从原始数据获得的翻译数据,检查与原始数据相关联的保护位,以确定保护位是否不能指示原始数据的一致性,如果是,则丢弃 启动翻译的数据以便于维持原始数据和翻译数据之间的数据一致性。

    Facilitating data coherency using in-memory tag bits and tag test instructions
    23.
    发明授权
    Facilitating data coherency using in-memory tag bits and tag test instructions 失效
    使用内存中标记位和标签测试指令促进数据一致性

    公开(公告)号:US08645644B2

    公开(公告)日:2014-02-04

    申请号:US13451682

    申请日:2012-04-20

    IPC分类号: G06F12/00

    摘要: A method is provided for fine-grained detection of data modification of original data by associating separate guard bits with granules of memory storing original data from which translated data has been obtained. The guard bits indicating whether the original data stored in the associated granule is protected for data coherency. The guard bits are set and cleared by special-purpose instructions. Responsive to attempting access to translated data obtained from the original data, the guard bit(s) associated with the original data is checked to determine whether the guard bit(s) fail to indicate coherency of the original data, and if so, discarding of the translated data is initiated to facilitate maintaining data coherency between the original data and the translated data.

    摘要翻译: 提供了一种通过将单独的保护位与存储已经获得翻译数据的原始数据的存储器的颗粒相关联的方法来对原始数据的数据修改进行细粒度检测。 保护位指示存储在相关联的粒子中的原始数据是否受到数据一致性的保护。 保护位通过专用指令进行设置和清除。 响应于尝试访问从原始数据获得的翻译数据,检查与原始数据相关联的保护位,以确定保护位是否不能指示原始数据的一致性,如果是,则丢弃 启动翻译的数据以便于维持原始数据和翻译数据之间的数据一致性。

    Victim cache lateral castout targeting
    24.
    发明授权
    Victim cache lateral castout targeting 失效
    受害者缓存横向倾斜定位

    公开(公告)号:US08489819B2

    公开(公告)日:2013-07-16

    申请号:US12340511

    申请日:2008-12-19

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A data processing system includes a plurality of processing units coupled by an interconnect fabric. In response to a data request, a victim cache line is selected for castout from a first lower level cache of a first processing unit, and a target lower level cache of one of the plurality of processing units is selected based upon architectural proximity of the target lower level cache to a home system memory to which the address of the victim cache line is assigned. The first processing unit issues on the interconnect fabric a lateral castout (LCO) command that identifies the victim cache line to be castout from the first lower level cache and indicates that the target lower level cache is an intended destination. In response to a coherence response indicating success of the LCO command, the victim cache line is removed from the first lower level cache and held in the second lower level cache.

    摘要翻译: 数据处理系统包括通过互连结构耦合的多个处理单元。 响应于数据请求,从第一处理单元的第一较低级别高速缓存中选择牺牲缓存行来进行舍弃,并且基于目标的体系结构接近来选择多个处理单元之一的目标下级高速缓存 低级缓存到分配了受害者缓存行的地址的归属系统存储器。 所述第一处理单元在所述互连结构上发出侧向锁定(LCO)命令,所述侧向锁定(LCO)命令标识要从所述第一低级缓存中抛出的所述牺牲缓存线,并且指示所述目标低级缓存是预期目的地。 响应于指示LCO命令的成功的一致性响应,从第一低级缓存中删除受害者高速缓存行并保存在第二较低级高速缓存中。

    Cache-based speculation of stores following synchronizing operations
    25.
    发明授权
    Cache-based speculation of stores following synchronizing operations 失效
    同步操作后,存储器中基于缓存的推测

    公开(公告)号:US08412888B2

    公开(公告)日:2013-04-02

    申请号:US12985590

    申请日:2011-01-06

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0837 G06F12/0895

    摘要: A store request in enqueued in a store queue of a cache memory of the data processing system. The store request identifies a target memory block by a target address and specifies store data. While the store request and a barrier request older than the store request are enqueued in the store queue, a read-claim machine of the cache memory is dispatched to acquire coherence ownership of target memory block of the store request. After coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, a cache array of the cache memory is updated with the store data.

    摘要翻译: 在数据处理系统的高速缓冲存储器的存储队列中排队的存储请求。 存储请求通过目标地址识别目标存储器块并指定存储数据。 当存储请求和存储请求之前的屏障请求在存储队列中排队时,调度高速缓冲存储器的读取机器以获取存储请求的目标存储器块的一致性所有权。 在获取目标存储器块的一致性所有权并且屏障请求已经从存储队列中退出之后,用存储数据更新高速缓冲存储器的高速缓存阵列。

    Aggregate data processing system having multiple overlapping synthetic computers
    26.
    发明授权
    Aggregate data processing system having multiple overlapping synthetic computers 有权
    具有多个重叠合成计算机的综合数据处理系统

    公开(公告)号:US08370595B2

    公开(公告)日:2013-02-05

    申请号:US12643800

    申请日:2009-12-21

    IPC分类号: G06F12/00

    摘要: A first SMP computer has first and second processing units and a first system memory pool, a second SMP computer has third and fourth processing units and a second system memory pool, and a third SMP computer has at least fifth and sixth processing units and third, fourth and fifth system memory pools. The fourth system memory pool is inaccessible to the third, fourth and sixth processing units and accessible to at least the second and fifth processing units, and the fifth system memory pool is inaccessible to the first, second and sixth processing units and accessible to at least the fourth and fifth processing units. A first interconnect couples the second processing unit for load-store coherent, ordered access to the fourth system memory pool, and a second interconnect couples the fourth processing unit for load-store coherent, ordered access to the fifth system memory pool.

    摘要翻译: 第一SMP计算机具有第一和第二处理单元和第一系统存储器池,第二SMP计算机具有第三和第四处理单元和第二系统存储器池,并且第三SMP计算机具有至少第五和第六处理单元,第三SMP计算机具有至少第五和第六处理单元, 第四和第五系统内存池。 第四系统存储器池对于第三,第四和第六处理单元是不可访问的,并且可访问至少第二和第五处理单元,并且第五系统存储器池对于第一,第二和第六处理单元是不可访问的,并且至少可访问 第四和第五处理单元。 第一互连耦合第二处理单元,用于对第四系统存储池进行加载存储相关的有序访问,并且第二互连耦合第四处理单元,用于加载存储相关的有序访问到第五系统存储池。

    MEMORY COHERENCE DIRECTORY SUPPORTING REMOTELY SOURCED REQUESTS OF NODAL SCOPE
    27.
    发明申请
    MEMORY COHERENCE DIRECTORY SUPPORTING REMOTELY SOURCED REQUESTS OF NODAL SCOPE 失效
    记忆协调指导原则支持远程请求的NODAL范围

    公开(公告)号:US20120203976A1

    公开(公告)日:2012-08-09

    申请号:US13445010

    申请日:2012-04-12

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0817

    摘要: A data processing system includes at least a first through third processing nodes coupled by an interconnect fabric. The first processing node includes a master, a plurality of snoopers capable of participating in interconnect operations, and a node interface that receives a request of the master and transmits the request of the master to the second processing unit with a nodal scope of transmission limited to the second processing node. The second processing node includes a node interface having a directory. The node interface of the second processing node permits the request to proceed with the nodal scope of transmission if the directory does not indicate that a target memory block of the request is cached other than in the second processing node and prevents the request from succeeding if the directory indicates that the target memory block of the request is cached other than in the second processing node.

    摘要翻译: 数据处理系统至少包括通过互连结构耦合的第一至第三处理节点。 第一处理节点包括主机,能够参与互连操作的多个侦听器,以及接收主机请求的节点接口,并将主机的请求传送到第二处理单元,传送范围限于 第二处理节点。 第二处理节点包括具有目录的节点接口。 第二处理节点的节点接口允许请求继续进行节点传输范围,如果该目录没有指示该请求的目标存储器块不是在第二处理节点中被缓存,并且如果该请求成功 目录指示除第二处理节点之外的请求的目标存储块被缓存。

    SELECTIVE CACHE-TO-CACHE LATERAL CASTOUTS
    28.
    发明申请
    SELECTIVE CACHE-TO-CACHE LATERAL CASTOUTS 审中-公开
    选择性高速缓存行驶路线

    公开(公告)号:US20120203973A1

    公开(公告)日:2012-08-09

    申请号:US13445646

    申请日:2012-04-12

    IPC分类号: G06F12/12

    CPC分类号: G06F12/0811 G06F12/12

    摘要: A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.

    摘要翻译: 数据处理系统包括第一和第二处理单元和系统存储器。 第一处理单元具有第一上层和第一下层高速缓存,第二处理单元具有第二上层和下层高速缓存。 响应于数据请求,选择要从第一较低级高速缓存丢弃的受害者高速缓存行,并且第一较低级高速缓存选择在执行到第二低级高速缓存的受害者高速缓存行的横向流出(LCO) 基于与受害者高速缓存行相关联的置信指示,将受害者缓存行的丢弃发送到系统存储器。 响应于选择LCO,第一处理单元在互连结构上发布LCO命令,并从第一低级缓存中移除受害者高速缓存行,并且第二下级缓存保存受害缓存行。

    Partial cache line storage-modifying operation based upon a hint
    29.
    发明授权
    Partial cache line storage-modifying operation based upon a hint 有权
    基于提示的部分缓存行存储修改操作

    公开(公告)号:US08140771B2

    公开(公告)日:2012-03-20

    申请号:US12024424

    申请日:2008-02-01

    IPC分类号: G06F12/04 G06F9/312

    CPC分类号: G06F12/0822

    摘要: In at least one embodiment, a method of data processing in a data processing system having a memory hierarchy includes a processor core executing a storage-modifying memory access instruction to determine a memory address. The processor core transmits to a cache memory within the memory hierarchy a storage-modifying memory access request including the memory address, an indication of a memory access type, and, if present, a partial cache line hint signaling access to less than all granules of a target cache line of data associated with the memory address. In response to the storage-modifying memory access request, the cache memory performs a storage-modifying access to all granules of the target cache line of data if the partial cache line hint is not present and performs a storage-modifying access to less than all granules of the target cache line of data if the partial cache line hint is present.

    摘要翻译: 在至少一个实施例中,具有存储器层次的数据处理系统中的数据处理方法包括执行存储修改存储器访问指令以确定存储器地址的处理器核心。 处理器核心向存储器层级内的高速缓冲存储器传送存储修改存储器访问请求,该存储修改存储器访问请求包括存储器地址,存储器访问类型的指示,以及如果存在的话,部分高速缓存行提示信令访问少于所有颗粒的 与存储器地址相关联的数据的目标高速缓存行。 响应于存储修改存储器访问请求,如果不存在部分高速缓存行提示,则高速缓存存储器对目标高速缓存行数据行的所有颗粒进行存储修改访问,并执行对小于全部的存储修改访问 如果存在部分高速缓存线提示,则目标高速缓存行数据的颗粒。

    Bandwidth of a cache directory by slicing the cache directory into two smaller cache directories and replicating snooping logic for each sliced cache directory
    30.
    发明授权
    Bandwidth of a cache directory by slicing the cache directory into two smaller cache directories and replicating snooping logic for each sliced cache directory 有权
    缓存目录的带宽通过将缓存目录分成两个较小的缓存目录,并为每个切片缓存目录复制侦听逻辑

    公开(公告)号:US08135910B2

    公开(公告)日:2012-03-13

    申请号:US11056721

    申请日:2005-02-11

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0831 G06F12/0851

    摘要: A cache, system and method for improving the snoop bandwidth of a cache directory. A cache directory may be sliced into two smaller cache directories each with its own snooping logic. By having two cache directories that can be accessed simultaneously, the bandwidth can be essentially doubled. Furthermore, a “frequency matcher” may shift the cycle speed to a lower speed upon receiving snoop addresses from the interconnect thereby slowing down the rate at which requests are transmitted to the dispatch pipelines. Each dispatch pipeline is coupled to a sliced cache directory and is configured to search the cache directory to determine if data at the received addresses is stored in the cache memory. As a result of slowing down the rate at which requests are transmitted to the dispatch pipelines and accessing the two sliced cache directories simultaneously, the bandwidth or throughput of the cache directory may be improved.

    摘要翻译: 用于提高缓存目录的窥探带宽的缓存,系统和方法。 高速缓存目录可以分成两个较小的缓存目录,每个具有自己的侦听逻辑。 通过具有可以同时访问的两个缓存目录,带宽可以基本上加倍。 此外,当从互连接收到窥探地址时,“频率匹配器”可以将周期速度转移到较低速度,从而将请求发送到调度管线的速率减慢。 每个调度流水线被耦合到一个切片缓存目录,并被配置为搜索该高速缓存目录以确定该接收到的地址上的数据是否被存储在该高速缓冲存储器中。 作为将请求发送到调度管线的速度变慢并且同时访问两个分片缓存目录的结果,可以提高缓存目录的带宽或吞吐量。