Reducing bandwidth and areas needed for non-inclusive memory hierarchy
by using dual tags
    1.
    发明授权
    Reducing bandwidth and areas needed for non-inclusive memory hierarchy by using dual tags 失效
    通过使用双标签降低非包容性内存层次结构所需的带宽和面积

    公开(公告)号:US6073212A

    公开(公告)日:2000-06-06

    申请号:US940217

    申请日:1997-09-30

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0811 G06F12/0831

    摘要: An apparatus and method for optimizing a non-inclusive hierarchical cache memory system that includes a first and second cache for storing information. The first and second cache are arranged in an hierarchical manner such as a level two and level three cache in a cache system having three levels of cache. The level two and level three cache hold information non-inclusively, while a dual directory holds tags and states that are duplicates of the tags and states held for the level two cache. All snoop requests (snoops) are passed to the dual directory by a snoop queue. The dual directory is used to determine whether a snoop request sent by snoop queue is relevant to the contents of level two cache, avoiding the need to send the snoop request to level two cache if there is a "miss" in the dual directory. This increases the available cache bandwidth that can be made available by second cache since the number of snoops appropriating the cache bandwidth of second cache are reduced by the filtering effect of dual directory. Also, the third cache is limited to holding read-only information and receiving write-invalidation snoop requests. Only snoops relating to write-invalidation requests are passed to a directory holding tags and state information corresponding to the third cache. Limiting snoop requests to write invalidation requests minimizes snoop requests to third cache, increasing the amount of cache memory bandwidth available for servicing catch fetches from third cache. In the event that a cache hit occurs in third cache, the information found in third cache must be transferred to second cache before a modification can be made to that information.

    摘要翻译: 一种用于优化包括用于存储信息的第一和第二高速缓存的非包容性分级高速缓冲存储器系统的装置和方法。 第一和第二高速缓存以具有三级高速缓存的高速缓存系统中的分级方式排列,例如二级和三级高速缓存。 二级和三级高速缓存保存信息非包容性,而双目录保存标签和状态,这些标签和状态是为二级缓存保留的标签和状态的重复。 所有侦听请求(snoop)通过侦听队列传递到双目录。 双目录用于确定由snoop队列发送的侦听请求是否与二级缓存的内容相关,避免在双目录中存在“miss”的情况下,将侦听请求发送到二级缓存。 这增加了可以由第二高速缓存提供的可用高速缓存带宽,因为通过双目录的过滤效果减少了分配第二高速缓存的高速缓存带宽的窥探次数。 而且,第三缓存限于保持只读信息并接收写无效侦听请求。 只有与写无效请求相关的窥探才被传递到保存与第三缓存对应的标签和状态信息的目录。 限制侦听请求写入无效请求可将窥探请求最小化到第三个缓存,从而增加可用于从第三个缓存提取抓取抓取的缓存内存带宽量。 在第三缓存中发生高速缓存命中的情况下,在对该信息进行修改之前,必须将第三高速缓存中发现的信息传送到第二高速缓存。

    Reducing cache misses by snarfing writebacks in non-inclusive memory
systems
    2.
    发明授权
    Reducing cache misses by snarfing writebacks in non-inclusive memory systems 失效
    通过在非包容性内存系统中缩写回写来减少高速缓存未命中

    公开(公告)号:US5909697A

    公开(公告)日:1999-06-01

    申请号:US940219

    申请日:1997-09-30

    IPC分类号: G06F12/08 G06F12/02

    CPC分类号: G06F12/0831 G06F12/0811

    摘要: A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.

    摘要翻译: 通过从第一高速缓存中移除第一高速缓存内容来优化非包容性多级缓存存储器系统,以便在第一高速缓存中提供高速缓存空间。 响应于第一和第二高速缓存中的高速缓存未命中,将移除的第一高速缓存内容存储在第二高速缓存中。 存储在第二高速缓存中的所有高速缓存内容被限制为具有只读属性,使得如果高速缓冲存储器系统中存在第二缓存中的高速缓存内容的任何副本,则处理器或等效设备必须寻求访问该位置的许可 其中存在该副本,确保高速缓存一致性。 如果处理器需要第一高速缓存内容(例如,当高速缓存命中发生在第一高速缓存内容的第二高速缓存中时),则如果需要,再次通过从第一高速缓存中选择第二高速缓存内容来在第一高速缓存中提供空间 第一个缓存并将其移动到第二个缓存。 然后将第一高速缓存内容从第二高速缓存移动到第一高速缓存,使得第一高速缓存可用于写访问。 将第二个缓存限制为只读访问减少了维护高速缓存一致性所需的每个标记的状态位数。 在使用MOESI协议的高速缓冲存储器系统中,每个标签的状态位的数量减少到第二高速缓存的单个位,减少标签开销并最小化放置在片上时使用的硅空间以提高高速缓存带宽。

    Cache memory system with independently accessible subdivided cache tag
arrays
    3.
    发明授权
    Cache memory system with independently accessible subdivided cache tag arrays 失效
    具有可独立访问的细分缓存标签阵列的高速缓存存储器系统

    公开(公告)号:US5675765A

    公开(公告)日:1997-10-07

    申请号:US604687

    申请日:1996-02-21

    IPC分类号: G06F12/08 G06F12/12 G06F13/16

    摘要: Two independently accessible subdivided cache tag arrays and a cache control logic is provided to a set associative cache system. Each tag entry is stored in two subdivided cache tag arrays, a physical and a set tag array such that each physical tag array entry has a corresponding set tag array entry. Each physical tag array entry stores the tag addresses and control bits for a set of cache lines. The control bits comprise at least one validity bit indicating whether the data stored in the corresponding cache line is valid. Each set tag array entry stores the descriptive bits for a set of cache lines which consists of the most recently used (MRU) field identifying the most recently used cache lines of the cache set. Each subdivided tag array is provided with its own interface to enable each array to be accessed concurrently but independently by the cache control logic which performs read and write operations against the cache. The cache control logic makes concurrent and independent accesses to the separate tag arrays to read and write the control and descriptive information in the tag entries. The accesses are grouped by type of operation to be performed and each type of accesses is made during predesignated time slots in an optimized manner to enable the cache control logic to perform certain selected read/write accesses to the physical tag array while performing other selected independent read/write accesses to the set tag array concurrently.

    摘要翻译: 两个独立可访问的细分高速缓存标签阵列和高速缓存控制逻辑被提供给一组联合高速缓存系统。 每个标签条目存储在两个细分高速缓存标签数组中,即物理标签和集合标签数组,使得每个物理标记数组条目都具有相应的集合标签数组条目。 每个物理标签阵列条目存储一组缓存线的标签地址和控制位。 控制位包括至少一个表示存储在对应的高速缓存行中的数据是否有效的有效位。 每个集合标签阵列条目存储用于识别高速缓存集合中最近使用的高速缓存行的最近使用(MRU)字段的一组高速缓存行的描述位。 每个细分标签阵列都提供有自己的接口,以使每个阵列可以同时进行访问,但是独立于对高速缓存执行读写操作的高速缓存控制逻辑。 缓存控制逻辑使对独立标签数组的并发和独立访问能够读取和写入标签条目中的控制和描述信息。 访问按要执行的操作类型进行分组,并且以优化的方式在预定时间段内进行每种类型的访问,以使得高速缓存控制逻辑能够执行对物理标签阵列的某些所选择的读/写访问,同时执行其他选择的独立 同时对set标签数组进行读/写访问。

    Broadcast demap for deallocating memory pages in a multiprocessor system
    4.
    发明授权
    Broadcast demap for deallocating memory pages in a multiprocessor system 失效
    在多处理器系统中解除分配内存页的广播解映射

    公开(公告)号:US5497480A

    公开(公告)日:1996-03-05

    申请号:US282170

    申请日:1994-07-29

    摘要: A method and apparatus for removing a page table entry from a plurality of translation lookaside buffers ("TLBs") in a multiprocessor computer system. The multiprocessor computer system includes at least two processors coupled to a packet-switched bus. Page table entries are removed from a plurality of TLBs in the multiprocessor computer system by first broadcasting a demap request packet on the packet-switched bus in response to one of the processors requesting that a page table entry be removed from its associated TLB. The demap request packet includes a virtual address and context information specifying this page table entry. Controllers reply to the demap request packet by sending a first reply packet to the controller that sent the original demap request packet to indicate receipt of the demap request packet. If a controller removes the page table entry from its associated TLB, that controller sends a second demap reply packet to indicate that the page table entry has been removed from its associated TLB.

    摘要翻译: 一种用于从多处理器计算机系统中的多个翻译后备缓冲器(“TLB”)中移除页表条目的方法和装置。 多处理器计算机系统包括耦合到分组交换总线的至少两个处理器。 通过首先在分组交换总线上广播解映射请求分组来响应于处理器中的一个请求从其相关联的TLB中移除页表项,从多个处理器计算机系统中的多个TLB中移除页表项。 解映射请求分组包括指定该页表项的虚拟地址和上下文信息。 控制器通过向发送原始解映射请求分组的控制器发送第一应答分组来响应解映射请求分组,以指示解映射请求分组的接收。 如果控制器从相关联的TLB中删除页表项,则该控制器发送第二解映射应答分组以指示该页表项已经从其关联的TLB中移除。

    Instruction and data cache with a shared TLB for split accesses and
snooping in the same clock cycle
    5.
    发明授权
    Instruction and data cache with a shared TLB for split accesses and snooping in the same clock cycle 失效
    指令和数据缓存与共享TLB在同一时钟周期中进行拆分访问和窥探

    公开(公告)号:US5440707A

    公开(公告)日:1995-08-08

    申请号:US875692

    申请日:1992-04-29

    IPC分类号: G06F12/08 G06F12/10

    CPC分类号: G06F12/1054 G06F12/0831

    摘要: A caching arrangement which can work efficiently in a superscaler and multiprocessing environment includes separate caches for instructions and data and a single translation lookaside buffer (TLB) shared by them. During each clock cycle, retrievals from both the instruction cache and data cache may be performed, one on the rising edge of the clock cycle and one on the falling edge. The TLB is capable of translating two addresses per clock cycle. Because the TLB is faster than accessing the tag arrays which in turn are faster than addressing the cache data arrays, virtual addresses may be concurrently supplied to all three components and the retrieval made in one phase of a clock cycle. When an instruction retrieval is being performed, snooping for snoop broadcasts may be performed for the data cache and vice versa. Thus, for every clock cycle, an instruction and data cache retrieval may be performed as well as snooping.

    摘要翻译: 可以在超标度和多处理环境中有效工作的缓存装置包括用于指令和数据的单独缓存和由它们共享的单个翻译后备缓冲器(TLB)。 在每个时钟周期期间,可以执行从指令高速缓存和数据高速缓存的检索,一个在时钟周期的上升沿,一个在下降沿。 TLB能够每个时钟周期翻译两个地址。 因为TLB比访问标签阵列要快,而是比寻址高速缓存数据阵列更快,所以可以将虚拟地址同时提供给所有三个组件,并在时钟周期的一个阶段进行检索。 当正在执行指令检索时,可以对数据高速缓存执行窥探广播的侦听,反之亦然。 因此,对于每个时钟周期,可以执行指令和数据高速缓存检索以及窥探。

    Cache miss buffer adapted to satisfy read requests to portions of a
cache fill in progress without waiting for the cache fill to complete
    6.
    发明授权
    Cache miss buffer adapted to satisfy read requests to portions of a cache fill in progress without waiting for the cache fill to complete 失效
    高速缓存未命中缓冲器适用于满足对缓存中填充的部分的读取请求,而不等待缓存填充完成

    公开(公告)号:US5353426A

    公开(公告)日:1994-10-04

    申请号:US875983

    申请日:1992-04-29

    IPC分类号: G06F12/08 G06F13/00

    CPC分类号: G06F12/0859

    摘要: A cache array, a cache tag and comparator unit and a cache multiplexor are provided to a cache memory. Each cache operation performed against the cache array, read or write, takes only half a clock cycle. The cache tag and comparator unit comprises a cache tag array, a cache miss buffer and control logic. Each cache operation performed against the cache tag array, read or write, also takes only half a clock cycle. The cache miss buffer comprises cache miss descriptive information identifying the current state of a cache fill in progress. The control logic comprises a plurality of combinatorial logics for performing tag match operations. In addition to standard tag match operations, the control logic also conditionally tag matches an accessing address against an address tag stored in the cache miss buffer. Depending on the results of the tag match operations, and further depending on the state of the current cache fill if the accessing address is part of the memory block frame of the current cache fill, the control logic provides appropriate signals to the cache array, the cache multiplexor, the main memory and the instruction/data destination. As a result, subsequent instruction/data requests that are part of a current cache fill in progress can be satisfied without having to wait for the completion of the current cache fill, thereby further reducing cache miss penalties and function unit idle time.

    摘要翻译: 缓存阵列,缓存标签和比较器单元以及高速缓存多路复用器被提供给高速缓冲存储器。 针对缓存阵列执行的每个缓存操作只读半个时钟周期。 缓存标签和比较器单元包括缓存标签阵列,高速缓存未命中缓冲器和控制逻辑。 针对缓存标签阵列执行的每个缓存操作,读或写也只需要半个时钟周期。 高速缓存未命中缓冲器包括识别正在进行的缓存填充的当前状态的缓存未命中描述信息。 控制逻辑包括用于执行标签匹配操作的多个组合逻辑。 除了标准标签匹配操作之外,控制逻辑还有条件地将访问地址与存储在高速缓存未命中缓冲器中的地址标签相匹配。 根据标签匹配操作的结果,如果访问地址是当前高速缓存填充的存储器块帧的一部分,则进一步取决于当前缓存填充的状态,控制逻辑向缓存阵列提供适当的信号, 缓存多路复用器,主存储器和指令/数据目的地。 结果,可以满足作为当前缓存的一部分的后续指令/数据请求,而不必等待当前高速缓存填充的完成,从而进一步减少高速缓存未命中处罚和功能单元空闲时间。

    Methods and apparatus for implementing a pseudo-LRU cache memory
replacement scheme with a locking feature
    7.
    发明授权
    Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature 失效
    用于实现具有锁定特征的伪LRU高速缓存存储器替换方案的方法和装置

    公开(公告)号:US5353425A

    公开(公告)日:1994-10-04

    申请号:US875357

    申请日:1992-04-29

    IPC分类号: G06F12/12 G06F13/14

    CPC分类号: G06F12/126 G06F12/125

    摘要: In a memory system having a main memory and a faster cache memory, a cache memory replacement scheme with a locking feature is provided. Locking bits associated with each line in the cache are supplied in the tag table. These locking bits are preferably set and reset by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in the cache to replace. The lock bits and replacement bits for a cache line are "ORed" to create a composite bit for the cache line. If the composite bit is set the cache line is not removed from the cache. When deadlock due to all composite bits being set will result, all replacement bits are cleared. One cache line is always maintained as non-lockable. The locking bits "lock" the line of data in the cache until such time when the process resets the lock bit. By providing that the process controls the state of the lock bits, the intelligence and knowledge the process contains regarding the frequency of use of certain memory locations can be utilized to provide a more efficient cache.

    摘要翻译: 在具有主存储器和更快的高速缓冲存储器的存储器系统中,提供具有锁定特征的高速缓存存储器替换方案。 在缓存中与每行相关联的锁定位在标签表中提供。 这些锁定位优选地通过执行应用程序/处理来设置和复位,并且由高速缓存控制器结合高速缓存替换位使用以确定要替换的高速缓存中的行。 高速缓存行的锁定位和替换位为“OR”以创建高速缓存行的复合位。 如果复合位置1,高速缓存行不会从缓存中删除。 当由于所有复合位被置位而导致死锁将导致所有替换位被清除。 一条缓存​​线始终保持不可锁定。 锁定位“锁定”高速缓存中的数据行,直到进程重置锁定位为止。 通过提供该过程控制锁定位的状态,可以利用该过程包含关于某些存储器位置的使用频率的智能和知识来提供更有效的缓存。

    Performing overlapping burst memory accesses and interleaved memory
accesses on cache misses
    8.
    发明授权
    Performing overlapping burst memory accesses and interleaved memory accesses on cache misses 失效
    对高速缓存未命中执行重叠突发存储器访问和交错存储器访问

    公开(公告)号:US5987570A

    公开(公告)日:1999-11-16

    申请号:US881557

    申请日:1997-06-24

    IPC分类号: G06F12/08 G06F13/16 G06F13/28

    CPC分类号: G06F12/0884

    摘要: A high performance microprocessor bus protocol for improving system throughput. The bus protocol enables overlapping read burst and write burst bus transactions to a cache, and interleaved bus transactions during external fetch cycles for missed cache lines. The bus protocol is implemented in a system comprising a CPU, and a secondary cache. The secondary cache comprises an SRAM array cache, and a cache controller. The CPU contains an instruction pipeline and a primary cache system.

    摘要翻译: 高性能微处理器总线协议,用于提高系统吞吐量。 总线协议使得重叠读突发和写突发总线事务到高速缓存,以及在错过的高速缓存行的外部读取周期期间的交错总线事务。 总线协议在包括CPU和二级高速缓存的系统中实现。 二次缓存包括SRAM阵列高速缓存和高速缓存控制器。 CPU包含指令流水线和主缓存系统。

    Method and apparatus for testing cache RAM residing on a microprocessor
    9.
    发明授权
    Method and apparatus for testing cache RAM residing on a microprocessor 失效
    用于测试驻留在微处理器上的缓存RAM的方法和装置

    公开(公告)号:US5781721A

    公开(公告)日:1998-07-14

    申请号:US714515

    申请日:1996-09-16

    摘要: An apparatus and method for enabling a cache controller and address and data buses of a microprocessor with an on-board cache to provide a SRAM test mode for testing the on-board cache. Upon assertion of a SRAM test signal to a SRAM test pin on the microprocessor chip, the cache and bus controllers cease normal functionality and permit data to be written to, and read from, individual addresses within the on-board cache as though the on-board cache is simple SRAM. After the chip is reset, standard SRAM tests can then be implemented by reading and writing data to selected cache memory addresses as though the cache memory were SRAM. Upon completion of the tests, the SRAM test signal is deasserted and the cache and bus controllers resume normal operating functionality. A reset signal is then applied to the microprocessor to reinitialize control logic employed within the microprocessor. In this way, cache memory on-board a microprocessor can be tested using standard SRAM testing algorithms and equipment thereby eliminating a need for specialized test equipment to test cache memory contained on a microprocessor chip.

    摘要翻译: 一种用于使得高速缓存控制器和具有板上高速缓存的微处理器的地址和数据总线能够提供用于测试车载高速缓存的SRAM测试模式的装置和方法。 在将SRAM测试信号断言给微处理器芯片上的SRAM测试引脚之后,高速缓存和总线控制器停止正常的功能,并允许将数据写入板载缓存中的单个地址并从其读取, 板缓存是简单的SRAM。 芯片复位后,可以通过将数据读取和写入选定的高速缓冲存储器地址来实现标准SRAM测试,就好像高速缓存是SRAM一样。 测试完成后,SRAM测试信号被断言,缓存和总线控制器恢复正常的操作功能。 然后将复位信号施加到微处理器以重新初始化微处理器内采用的控制逻辑。 以这种方式,可以使用标准SRAM测试算法和设备来测试微处理器上的高速缓存,从而无需专门的测试设备来测试包含在微处理器芯片上的高速缓冲存储器。

    Method and apparatus for a coherent copy-back buffer in a multipressor
computer system
    10.
    发明授权
    Method and apparatus for a coherent copy-back buffer in a multipressor computer system 失效
    用于多重压缩机计算机系统中的相干复制缓冲器的方法和装置

    公开(公告)号:US5708792A

    公开(公告)日:1998-01-13

    申请号:US681602

    申请日:1996-07-29

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0833

    摘要: A method and apparatus for maintaining cache coherency in a multiprocessor system having a plurality of processors and a shared main memory. Each of the plurality of processors is coupled to at least one cache unit and a store buffer. The method comprises the steps of writing by a first cache unit to its first store buffer a dirty line when the first cache unit experiences a cache miss; gaining control of the bus by the first cache unit; reading a new line from the share main memory by the first cache unit through the bus; writing the dirty line to the shared main memory if the bus is available to the first cache unit and if not available, the first cache unit checking snooping by a second cache unit from a second processor; comparing an address from the second cache unit with the tag of the dirty line, wherein the tag is stored in content-addressable memory coupled to the store buffer and if there is a hit, then supplying the dirty line to the second cache unit for updating.

    摘要翻译: 一种用于在具有多个处理器和共享主存储器的多处理器系统中维持高速缓存一致性的方法和装置。 多个处理器中的每一个耦合到至少一个高速缓存单元和存储缓冲器。 该方法包括以下步骤:当第一高速缓存单元经历高速缓存未命中时,由第一高速缓存单元向其第一存储缓冲区写入脏行; 由第一缓存单元获得对总线的控制; 通过总线从第一缓存单元读取共享主存储器中的新行; 如果总线可用于第一高速缓存单元并且如果不可用,则将脏线写入共享主存储器,第一高速缓存单元通过第二高速缓存单元从第二处理器检测窥探; 将来自第二高速缓存单元的地址与脏线的标签进行比较,其中标签被存储在耦合到存储缓冲器的内容可寻址存储器中,并且如果存在命中,则将脏线提供给第二高速缓存单元以进行更新 。