Reducing bandwidth and areas needed for non-inclusive memory hierarchy
by using dual tags
    1.
    发明授权
    Reducing bandwidth and areas needed for non-inclusive memory hierarchy by using dual tags 失效
    通过使用双标签降低非包容性内存层次结构所需的带宽和面积

    公开(公告)号:US6073212A

    公开(公告)日:2000-06-06

    申请号:US940217

    申请日:1997-09-30

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0811 G06F12/0831

    摘要: An apparatus and method for optimizing a non-inclusive hierarchical cache memory system that includes a first and second cache for storing information. The first and second cache are arranged in an hierarchical manner such as a level two and level three cache in a cache system having three levels of cache. The level two and level three cache hold information non-inclusively, while a dual directory holds tags and states that are duplicates of the tags and states held for the level two cache. All snoop requests (snoops) are passed to the dual directory by a snoop queue. The dual directory is used to determine whether a snoop request sent by snoop queue is relevant to the contents of level two cache, avoiding the need to send the snoop request to level two cache if there is a "miss" in the dual directory. This increases the available cache bandwidth that can be made available by second cache since the number of snoops appropriating the cache bandwidth of second cache are reduced by the filtering effect of dual directory. Also, the third cache is limited to holding read-only information and receiving write-invalidation snoop requests. Only snoops relating to write-invalidation requests are passed to a directory holding tags and state information corresponding to the third cache. Limiting snoop requests to write invalidation requests minimizes snoop requests to third cache, increasing the amount of cache memory bandwidth available for servicing catch fetches from third cache. In the event that a cache hit occurs in third cache, the information found in third cache must be transferred to second cache before a modification can be made to that information.

    摘要翻译: 一种用于优化包括用于存储信息的第一和第二高速缓存的非包容性分级高速缓冲存储器系统的装置和方法。 第一和第二高速缓存以具有三级高速缓存的高速缓存系统中的分级方式排列,例如二级和三级高速缓存。 二级和三级高速缓存保存信息非包容性,而双目录保存标签和状态,这些标签和状态是为二级缓存保留的标签和状态的重复。 所有侦听请求(snoop)通过侦听队列传递到双目录。 双目录用于确定由snoop队列发送的侦听请求是否与二级缓存的内容相关,避免在双目录中存在“miss”的情况下,将侦听请求发送到二级缓存。 这增加了可以由第二高速缓存提供的可用高速缓存带宽,因为通过双目录的过滤效果减少了分配第二高速缓存的高速缓存带宽的窥探次数。 而且,第三缓存限于保持只读信息并接收写无效侦听请求。 只有与写无效请求相关的窥探才被传递到保存与第三缓存对应的标签和状态信息的目录。 限制侦听请求写入无效请求可将窥探请求最小化到第三个缓存,从而增加可用于从第三个缓存提取抓取抓取的缓存内存带宽量。 在第三缓存中发生高速缓存命中的情况下,在对该信息进行修改之前,必须将第三高速缓存中发现的信息传送到第二高速缓存。

    Reducing cache misses by snarfing writebacks in non-inclusive memory
systems
    2.
    发明授权
    Reducing cache misses by snarfing writebacks in non-inclusive memory systems 失效
    通过在非包容性内存系统中缩写回写来减少高速缓存未命中

    公开(公告)号:US5909697A

    公开(公告)日:1999-06-01

    申请号:US940219

    申请日:1997-09-30

    IPC分类号: G06F12/08 G06F12/02

    CPC分类号: G06F12/0831 G06F12/0811

    摘要: A non-inclusive multi-level cache memory system is optimized by removing a first cache content from a first cache, so as to provide cache space in the first cache. In response to a cache miss in the first and second caches, the removed first cache content is stored in a second cache. All cache contents that are stored in the second cache are limited to have read-only attributes so that if any copies of the cache contents in the second cache exist in the cache memory system, a processor or equivalent device must seek permission to access the location in which that copy exists, ensuring cache coherency. If the first cache content is required by a processor (e.g., when a cache hit occurs in the second cache for the first cache content), room is again made available, if required, in the first cache by selecting a second cache content from the first cache and moving it to the second cache. The first cache content is then moved from the second cache to the first cache, rendering the first cache available for write access. Limiting the second cache to read-only access reduces the number of status bits per tag that are required to maintain cache coherency. In a cache memory system using a MOESI protocol, the number of status bits per tag is reduced to a single bit for the second cache, reducing tag overhead and minimizing silicon real estate used when placed on-chip to improve cache bandwidth.

    摘要翻译: 通过从第一高速缓存中移除第一高速缓存内容来优化非包容性多级缓存存储器系统,以便在第一高速缓存中提供高速缓存空间。 响应于第一和第二高速缓存中的高速缓存未命中,将移除的第一高速缓存内容存储在第二高速缓存中。 存储在第二高速缓存中的所有高速缓存内容被限制为具有只读属性,使得如果高速缓冲存储器系统中存在第二缓存中的高速缓存内容的任何副本,则处理器或等效设备必须寻求访问该位置的许可 其中存在该副本,确保高速缓存一致性。 如果处理器需要第一高速缓存内容(例如,当高速缓存命中发生在第一高速缓存内容的第二高速缓存中时),则如果需要,再次通过从第一高速缓存中选择第二高速缓存内容来在第一高速缓存中提供空间 第一个缓存并将其移动到第二个缓存。 然后将第一高速缓存内容从第二高速缓存移动到第一高速缓存,使得第一高速缓存可用于写访问。 将第二个缓存限制为只读访问减少了维护高速缓存一致性所需的每个标记的状态位数。 在使用MOESI协议的高速缓冲存储器系统中,每个标签的状态位的数量减少到第二高速缓存的单个位,减少标签开销并最小化放置在片上时使用的硅空间以提高高速缓存带宽。

    Speculative cache line write backs to avoid hotspots
    3.
    发明授权
    Speculative cache line write backs to avoid hotspots 失效
    推测缓存行回写以避免热点

    公开(公告)号:US6119205A

    公开(公告)日:2000-09-12

    申请号:US995779

    申请日:1997-12-22

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0804

    摘要: A cache system including a data cache memory comprising a plurality of cache lines. A tag store has an entry representing each line in the cache memory where each entry comprises tag information for accessing the data cache. The tag information includes state information indicating whether the represented cache line includes dirty data. A speculative write back unit monitors the state information and is operative to initiate a write back of a cache line having more than a preselected amount of dirty data.

    摘要翻译: 一种包括数据高速缓冲存储器的缓存系统,包括多个高速缓存行。 标签存储具有表示高速缓冲存储器中每行的条目,其中每个条目包括用于访问数据高速缓存的标签信息。 标签信息包括指示所表示的高速缓存行是否包括脏数据的状态信息。 推测写回单元监视状态信息,并且可操作地启动具有多于预选量的脏数据的高速缓存线的回写。

    Memory management in a shared memory system
    4.
    发明授权
    Memory management in a shared memory system 有权
    共享内存系统中的内存管理

    公开(公告)号:US08001333B2

    公开(公告)日:2011-08-16

    申请号:US12591406

    申请日:2009-11-18

    申请人: Fong Pong

    发明人: Fong Pong

    IPC分类号: G06F12/00

    摘要: Methods, systems and computer program products to maintain cache coherency, in a System On Chip (SOC) which is part of a distributed shared memory system are described. A local SOC unit that includes a local controller and an on-chip memory is provided. In response to receiving a request from a remote controller of a remote SOC to access a memory location, the local controller determines whether the local SOC has exclusive ownership of the requested memory location, sends data from the memory location if the local SOC has exclusive ownership of the memory location and stores an entry in the on-chip memory that identifies the remote SOC as having requested data from the memory location. The entry specifies whether the request from the remote SOC is for exclusive ownership of the memory location. The entry also includes a field that identifies the remote SOC as the requester. The requested memory location may be external or internal to the local SOC unit.

    摘要翻译: 描述了作为分布式共享存储器系统的一部分的片上系统(SOC)中的高速缓存一致性的方法,系统和计算机程序产品。 提供了包括本地控制器和片上存储器的本地SOC单元。 响应于从远程SOC的远程控制器接收到访问存储器位置的请求,本地控制器确定本地SOC是否具有所请求的存储器位置的独占所有权,如果本地SOC具有专有所有权,则从存储器位置发送数据 的存储器位置并且将片段存储器中的条目存储,其将远程SOC识别为具有来自存储器位置的请求的数据。 条目指定来自远程SOC的请求是否用于存储器位置的专有所有权。 该条目还包括将远程SOC识别为请求者的字段。 所请求的存储器位置可以是本地SOC单元的外部或内部。

    High bandwidth split bus
    5.
    发明授权
    High bandwidth split bus 有权
    高带宽分割总线

    公开(公告)号:US07904624B2

    公开(公告)日:2011-03-08

    申请号:US12348603

    申请日:2009-01-05

    IPC分类号: G06F13/00

    CPC分类号: G06F12/0831 G06F13/4045

    摘要: A system includes a first bus segment and a second bus segment. The first bus segment is operatively coupled to one or more first bus agents, where the first bus agents are configured for writing messages to the first bus segment and reading messages from the first bus segment and the second bus segment, which is separate from the first bus segment, is operatively coupled to one or more second bus agents. The first bus agents are configured for writing messages to the first bus segment and reading messages from the first bus segment. The system also includes first electrical circuitry operably coupled to the first bus segment and the second bus segment and configured to read messages written on the first bus segment and to write the messages onto the second bus segment and second electrical circuitry operably coupled to the first bus segment and the second bus segment and configured to read messages written on the second bus segment and to write the messages onto the first bus segment.

    摘要翻译: 系统包括第一总线段和第二总线段。 第一总线段可操作地耦合到一个或多个第一总线代理,其中第一总线代理被配置用于将消息写入第一总线段并从第一总线段和第二总线段读取消息,该第一总线段与第一总线段 总线段,可操作地耦合到一个或多个第二总线代理。 第一总线代理被配置为将消息写入第一总线段并从第一总线段读取消息。 该系统还包括可操作地耦合到第一总线段和第二总线段的第一电路,并且被配置为读取在第一总线段上写入的消息并且将消息写入第二总线段上,并将第二电路可操作地耦合到第一总线 段和第二总线段,并且被配置为读取写在第二总线段上的消息并将消息写入第一总线段。

    Method and system for hash table based routing via a prefix transformation
    6.
    发明授权
    Method and system for hash table based routing via a prefix transformation 失效
    通过前缀变换的基于散列表路由的方法和系统

    公开(公告)号:US07852851B2

    公开(公告)日:2010-12-14

    申请号:US11776652

    申请日:2007-07-12

    申请人: Fong Pong

    发明人: Fong Pong

    IPC分类号: H04L12/28 H04L12/56

    摘要: Aspects of a method and system for hash table based routing via prefix transformation are provided. Aspects of the invention may enable translating one or more network addresses as a coefficient set of a polynomial, and routing data in a network based on a quotient and a remainder derived from the coefficient set. In this regard, the quotient and the remainder may be calculated via modulo 2 division of the polynomial by a primitive generator polynomial. In one example, the remainder may be calculated with the aid of a remainder table. The primitive generator polynomial may be x16+x8+x6+x5+x4+x2+1. Additionally, entries in one or more hash tables may comprise a calculated quotient and may be indexed by a calculated remainder. In this manner, the hash tables may be accessed to determine a longest prefix match for the one or more network addresses. The hash tables may comprise 2deg(g(x)) sets, where deg(g(x)) is the degree of the primitive generator polynomial. Accordingly, the hash tables may be set associative and multiple entries may be indexed by the same remainder. Furthermore, entries in the hash tables may comprise a next hop address utilized in routing network traffic.

    摘要翻译: 提供了通过前缀变换进行基于散列表路由的方法和系统的方面。 本发明的各方面可以实现一个或多个网络地址的翻译,作为多项式的系数集合,以及基于商和从系数集得到的余数来在网络中路由数据。 在这点上,商和余数可以通过多项式的模2除以原始生成多项式来计算。 在一个示例中,可以借助余数表来计算余数。 原始生成多项式可以是x16 + x8 + x6 + x5 + x4 + x2 + 1。 另外,一个或多个哈希表中的条目可以包括计算的商,并且可以通过计算的余数进行索引。 以这种方式,可以访问哈希表以确定一个或多个网络地址的最长前缀匹配。 哈希表可以包括2deg(g(x))集合,其中deg(g(x))是原始生成多项式的程度。 因此,哈希表可以被设置为相关联的,并且多个条目可以被相同的余数索引。 此外,哈希表中的条目可以包括在路由网络业务中使用的下一跳地址。

    Global address space management
    7.
    发明申请
    Global address space management 有权
    全球地址空间管理

    公开(公告)号:US20100106899A1

    公开(公告)日:2010-04-29

    申请号:US12654248

    申请日:2009-12-15

    申请人: Fong Pong

    发明人: Fong Pong

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0223

    摘要: Methods, systems and computer program products for global address space management are described herein. A System on Chip (SOC) unit configured for a global address space is provided. The SOC includes an on-chip memory, a first controller and a second controller. The first controller is enabled to decode addresses that map to memory locations in the on-chip memory and the second controller is enabled to decode addresses that map to memory locations in an off-chip memory.

    摘要翻译: 本文描述了用于全局地址空间管理的方法,系统和计算机程序产品。 提供了一种为全局地址空间配置的片上系统(SOC)单元。 SOC包括片上存储器,第一控制器和第二控制器。 第一控制器能够解码映射到片上存储器中的存储器位置的地址,并且第二控制器能够解码映射到片外存储器中的存储器位置的地址。

    Ring-based cache coherent bus
    8.
    发明授权
    Ring-based cache coherent bus 有权
    基于环的缓存一致总线

    公开(公告)号:US07500031B2

    公开(公告)日:2009-03-03

    申请号:US11290940

    申请日:2005-11-30

    申请人: Fong Pong

    发明人: Fong Pong

    IPC分类号: G06F3/00 G06F15/167

    CPC分类号: G06F13/4247

    摘要: Managing data traffic among three or more bus agents configured in a topological ring includes numbering each bus agent sequentially and injecting messages that include a binary polarity value from the bus agents into the ring in a sequential order according to the numbering of the bus agents during cycles of bus agent activity. Messages from the ring are received into two or more receive buffers of a receiving bus agent, and the value of the binary polarity value is alternated after succeeding cycles of bus ring activity. The received messages are ordered for processing by the receiving bus agent based on the polarity value of the messages and a time at which each message was received.

    摘要翻译: 在拓扑环中配置的三个或更多个总线代理之间管理数据流量包括依次对每个总线代理进行编号,并根据总线代理在循环期间的编号按顺序从总线代理将包含二进制极性值的消息注入到环中 的巴士代理活动。 来自环的消息被接收到接收总线代理的两个或更多个接收缓冲器中,并且二进制极性值的值在总线环活动的后续周期之后交替。 接收到的消息是按照消息的极性值和接收到每个消息的时间由接收总线代理进行处理的。

    Shared memory architecture
    9.
    发明申请
    Shared memory architecture 有权
    共享内存架构

    公开(公告)号:US20080301379A1

    公开(公告)日:2008-12-04

    申请号:US11807986

    申请日:2007-05-31

    申请人: Fong Pong

    发明人: Fong Pong

    IPC分类号: G06F12/00

    摘要: Disclosed herein is an apparatus which may comprise a plurality of nodes. In one example embodiment, each of the plurality of nodes may include one or more central processing units (CPUs), a random access memory device, and a parallel link input/output port. The random access memory device may include a local memory address space and a global memory address space. The local memory address space may be accessible to the one or more CPUs of the node that comprises the random access memory device. The global memory address space may be accessible to CPUs of all the nodes. The parallel link input/output port may be configured to send data frames to, and receive data frames from, the global memory address space comprised by the random access memory device(s) of the other nodes.

    摘要翻译: 这里公开了一种可以包括多个节点的装置。 在一个示例实施例中,多个节点中的每一个可以包括一个或多个中央处理单元(CPU),随机存取存储器设备和并行链路输入/输出端口。 随机存取存储器件可以包括本地存储器地址空间和全局存储器地址空间。 本地存储器地址空间可以由包括随机存取存储器件的节点的一个或多个CPU访问。 所有节点的CPU都可以访问全局内存地址空间。 并行链路输入/输出端口可以被配置为向由其他节点的随机存取存储器件组成的全局存储器地址空间发送数据帧并从其接收数据帧。

    Apparatus and methods for a high performance hardware network protocol processing engine
    10.
    发明申请
    Apparatus and methods for a high performance hardware network protocol processing engine 审中-公开
    高性能硬件网络协议处理引擎的装置和方法

    公开(公告)号:US20060274789A1

    公开(公告)日:2006-12-07

    申请号:US11228863

    申请日:2005-09-16

    申请人: Fong Pong

    发明人: Fong Pong

    摘要: Certain embodiments of the invention may be found in a method for a high performance hardware network protocol processing engine. The method may comprise processing TCP packets via a plurality of pipelined hardware stages on a single network chip. Headers of received TCP packets may be parsed, and Ethernet frame CRC digests, IP checksums and TCP checksums may be validated, at a first stage of the parallel, pipelined hardware stages. IP addresses of the TCP packets that are received may also be validated at the first stage. TCB index of the TCP packets that are received may be looked up at a second stage. TCB data for TCP packets may be looked up at a third stage and receive processing of the TCP packets may be performed at a fourth stage. A fifth stage may initiate transfer of the processed TCP packets that are received to an application layer.

    摘要翻译: 本发明的某些实施例可以在用于高性能硬件网络协议处理引擎的方法中找到。 该方法可以包括经由单个网络芯片上的多个流水线硬件级处理TCP分组。 在并行的流水线硬件阶段的第一阶段,可以解析接收的TCP分组的报头,并且可以验证以太网帧CRC摘要,IP校验和和TCP校验和。 接收到的TCP数据包的IP地址也可以在第一阶段进行验证。 可以在第二阶段查找接收到的TCP分组的TCB索引。 可以在第三阶段查找用于TCP分组的TCB数据,并且可以在第四阶段执行TCP分组的接收处理。 第五阶段可以启动被接收的处理的TCP分组的传送到应用层。