Reducing bandwidth and areas needed for non-inclusive memory hierarchy
by using dual tags
    1.
    发明授权
    Reducing bandwidth and areas needed for non-inclusive memory hierarchy by using dual tags 失效
    通过使用双标签降低非包容性内存层次结构所需的带宽和面积

    公开(公告)号:US6073212A

    公开(公告)日:2000-06-06

    申请号:US940217

    申请日:1997-09-30

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0811 G06F12/0831

    摘要: An apparatus and method for optimizing a non-inclusive hierarchical cache memory system that includes a first and second cache for storing information. The first and second cache are arranged in an hierarchical manner such as a level two and level three cache in a cache system having three levels of cache. The level two and level three cache hold information non-inclusively, while a dual directory holds tags and states that are duplicates of the tags and states held for the level two cache. All snoop requests (snoops) are passed to the dual directory by a snoop queue. The dual directory is used to determine whether a snoop request sent by snoop queue is relevant to the contents of level two cache, avoiding the need to send the snoop request to level two cache if there is a "miss" in the dual directory. This increases the available cache bandwidth that can be made available by second cache since the number of snoops appropriating the cache bandwidth of second cache are reduced by the filtering effect of dual directory. Also, the third cache is limited to holding read-only information and receiving write-invalidation snoop requests. Only snoops relating to write-invalidation requests are passed to a directory holding tags and state information corresponding to the third cache. Limiting snoop requests to write invalidation requests minimizes snoop requests to third cache, increasing the amount of cache memory bandwidth available for servicing catch fetches from third cache. In the event that a cache hit occurs in third cache, the information found in third cache must be transferred to second cache before a modification can be made to that information.

    摘要翻译: 一种用于优化包括用于存储信息的第一和第二高速缓存的非包容性分级高速缓冲存储器系统的装置和方法。 第一和第二高速缓存以具有三级高速缓存的高速缓存系统中的分级方式排列,例如二级和三级高速缓存。 二级和三级高速缓存保存信息非包容性,而双目录保存标签和状态,这些标签和状态是为二级缓存保留的标签和状态的重复。 所有侦听请求(snoop)通过侦听队列传递到双目录。 双目录用于确定由snoop队列发送的侦听请求是否与二级缓存的内容相关,避免在双目录中存在“miss”的情况下,将侦听请求发送到二级缓存。 这增加了可以由第二高速缓存提供的可用高速缓存带宽,因为通过双目录的过滤效果减少了分配第二高速缓存的高速缓存带宽的窥探次数。 而且,第三缓存限于保持只读信息并接收写无效侦听请求。 只有与写无效请求相关的窥探才被传递到保存与第三缓存对应的标签和状态信息的目录。 限制侦听请求写入无效请求可将窥探请求最小化到第三个缓存,从而增加可用于从第三个缓存提取抓取抓取的缓存内存带宽量。 在第三缓存中发生高速缓存命中的情况下,在对该信息进行修改之前,必须将第三高速缓存中发现的信息传送到第二高速缓存。

    Reducing cache misses by snarfing writebacks in non-inclusive memory
systems
    2.
    发明授权
    Reducing cache misses by snarfing writebacks in non-inclusive memory systems 失效
    通过在非包容性内存系统中缩写回写来减少高速缓存未命中

    公开(公告)号:US5909697A

    公开(公告)日:1999-06-01

    申请号:US940219

    申请日:1997-09-30

    IPC分类号: G06F12/08 G06F12/02

    CPC分类号: G06F12/0831 G06F12/0811

    摘要: A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.

    摘要翻译: 通过从第一高速缓存中移除第一高速缓存内容来优化非包容性多级缓存存储器系统,以便在第一高速缓存中提供高速缓存空间。 响应于第一和第二高速缓存中的高速缓存未命中,将移除的第一高速缓存内容存储在第二高速缓存中。 存储在第二高速缓存中的所有高速缓存内容被限制为具有只读属性,使得如果高速缓冲存储器系统中存在第二缓存中的高速缓存内容的任何副本,则处理器或等效设备必须寻求访问该位置的许可 其中存在该副本,确保高速缓存一致性。 如果处理器需要第一高速缓存内容(例如,当高速缓存命中发生在第一高速缓存内容的第二高速缓存中时),则如果需要,再次通过从第一高速缓存中选择第二高速缓存内容来在第一高速缓存中提供空间 第一个缓存并将其移动到第二个缓存。 然后将第一高速缓存内容从第二高速缓存移动到第一高速缓存,使得第一高速缓存可用于写访问。 将第二个缓存限制为只读访问减少了维护高速缓存一致性所需的每个标记的状态位数。 在使用MOESI协议的高速缓冲存储器系统中,每个标签的状态位的数量减少到第二高速缓存的单个位,减少标签开销并最小化放置在片上时使用的硅空间以提高高速缓存带宽。

    Apparatus and method for dual access to a banked and pipelined data cache memory unit
    3.
    发明授权
    Apparatus and method for dual access to a banked and pipelined data cache memory unit 有权
    用于双重访问存储和流水线数据高速缓冲存储器单元的设备和方法

    公开(公告)号:US06973557B2

    公开(公告)日:2005-12-06

    申请号:US10358023

    申请日:2003-02-04

    IPC分类号: G06F12/08 G06F12/10

    CPC分类号: G06F12/0846 G06F12/1045

    摘要: In a data cache unit that exchanges data signal groups with at least two execution units, the operation of the data cache unit is implemented as a three-stage pipeline in order to access data at the speed of the system clock. For a READ operation, virtual address components are applied to a storage cell bank unit implemented in SAM technology to begin access of the storage cells with the data signal group identified by the virtual address components. The virtual address components are also applied to a microtag unit, the microtag unit identifying a subgroup of the signal group identified by the address components. Simultaneously, the virtual address is formed from the two virtual address components and applied to a translation table unit, to a valid-bit array unit, and to a tag unit. The translation table unit and the tag unit determine whether the correct data signal subgroup identified by the address signal group is stored in the data cache memory unit. The selected data signal subgroup and a HIT/MISS signal are transmitted to the execution unit during the same cycle. For a WRITE operation, only two pipeline stages are required. In addition, the WRITE operation can involve the storage in the data cache memory of a single data signal group or a plurality of data signal groups. Because the storage cells are arranged in banks, simultaneous interaction by the two execution units is possible.

    摘要翻译: 在以至少两个执行单元交换数据信号组的数据高速缓存单元中,将数据高速缓存单元的操作实现为三级流水线,以便以系统时钟的速度访问数据。 对于READ操作,虚拟地址分量被应用于以SAM技术实现的存储单元组单元,以开始使用由虚拟地址分量标识的数据信号组来存取存储单元。 虚拟地址分量也被应用于微标签单元,微标签单元标识由地址分量标识的信号组的子组。 同时,虚拟地址由两个虚拟地址分量形成并应用于转换表单元,有效位阵列单元和标签单元。 转换表单元和标签单元确定由地址信号组识别的正确的数据信号子组是否存储在数据高速缓冲存储器单元中。 所选择的数据信号子组和HIT / MISS信号在同一周期内发送到执行单元。 对于WRITE操作,只需要两个流水线阶段。 此外,写入操作可以涉及在单个数据信号组或多个数据信号组的数据高速缓冲存储器中的存储。 由于存储单元被布置在存储体中,所以两个执行单元的同时交互是可能的。

    Multiple data hazards detection and resolution unit
    4.
    发明授权
    Multiple data hazards detection and resolution unit 有权
    多重数据危害检测和分辨率单位

    公开(公告)号:US07555634B1

    公开(公告)日:2009-06-30

    申请号:US10830244

    申请日:2004-04-22

    IPC分类号: G06F13/00

    摘要: Order indication logic can be recycled for at least two different data hazards, thus reducing the amount of processor real estate consumed by data hazard resolution logic. The logic also allows a single priority picker to be utilized for coloring without the cost of additional pipeline stages. A single priority picker can be utilized to identify memory operations for performing RAW bypass and for resolving OERs. For instance, a data hazard resolution unit resolves at least two different data hazards between resident memory operations and incoming memory operations with a set of logic that indicates order of the resident memory operations relative to the incoming memory operations. The indicated order corresponds to the data hazard being resolved. The data hazard resolution unit includes a priority picker to select one of the indicated resident memory operations for either data hazard.

    摘要翻译: 订单指示逻辑可以回收至少两个不同的数据危害,从而减少数据危害解决逻辑所消耗的处理器的数量。 该逻辑还允许单个优先选择器用于着色,而不需要额外的流水线阶段的成本。 可以使用单个优先选择器来识别用于执行RAW旁路和解析OER的存储器操作。 例如,数据危险解决单元利用指示驻留存储器操作相对于进入存储器操作的顺序的一组逻辑来解决驻留存储器操作和进入的存储器操作之间的至少两个不同的数据危害。 指示的顺序对应于正在解决的数据危险。 数据危险解决单元包括优先选择器,用于为数据危险选择所指示的驻留存储器操作之一。

    Apparatus and method for snoop access in a dual access, banked and pipelined data cache memory unit
    5.
    发明授权
    Apparatus and method for snoop access in a dual access, banked and pipelined data cache memory unit 有权
    在双重访问,分组和流水线数据高速缓冲存储器单元中窥探访问的装置和方法

    公开(公告)号:US07020752B2

    公开(公告)日:2006-03-28

    申请号:US10360686

    申请日:2003-02-07

    IPC分类号: G06F13/00

    摘要: In a data cache unit that exchanges data signal groups with at least two execution units, the operation of the data cache unit is implemented as a three-stage pipeline in order to access data at the speed of the system clock. The data cache unit has a plurality of storage cell banks. Each storage cell bank has valid bit array unit and a tag unit for each execution unit incorporated therein. Each valid bit array unit has a valid/invalid storage cell associated with each data group stored in the storage cell bank. The valid bit array units have a read/write address port and snoop address port. During a read operation, the associated valid/invalid signal is retrieved to determine whether the data signal group should be processed by the associated execution unit. In a write operation, a valid bit is set in the valid/invalid bit location(s) associated with the storage of a data signal group (or groups) during memory access. The valid bit array unit responds to a snoop address and a control signal from the tag unit to set an invalid bit in a valid/invalid bit address location associated with the snoop address. The tag unit can be divided into a plurality of tag subunits to expedite processing.

    摘要翻译: 在以至少两个执行单元交换数据信号组的数据高速缓存单元中,将数据高速缓存单元的操作实现为三级流水线,以便以系统时钟的速度访问数据。 数据高速缓存单元具有多个存储单元组。 每个存储单元组具有有效的位阵列单元和用于其中并入的每个执行单元的标签单元。 每个有效位阵列单元具有与存储在存储单元组中的每个数据组相关联的有效/无效存储单元。 有效的位阵列单元具有读/写地址端口和侦听地址端口。 在读取操作期间,检索相关联的有效/无效信号以确定数据信号组是否应由相关联的执行单元处理。 在写入操作中,在存储器访问期间与数据信号组(或组)的存储相关联的有效/无效位置中设置有效位。 有效位阵列单元响应窥探地址和来自标签单元的控制信号,以将与侦听地址相关联的有效/无效位地址位置中的无效位置1。 标签单元可以被分成多个标签子单元以加速处理。