Hardware-assisted method for scheduling threads using data cache locality
    1.
    发明授权
    Hardware-assisted method for scheduling threads using data cache locality 失效
    使用数据缓存位置调度线程的硬件辅助方法

    公开(公告)号:US06938252B2

    公开(公告)日:2005-08-30

    申请号:US09737129

    申请日:2000-12-14

    摘要: A method is provided for scheduling threads in a multi-processor system. In a first structure thread ids are stored for threads associated with a context switch. Each thread id identifies one thread. In a second structure entries are stored for groups of contiguous cache lines. Each entry is arranged such that a thread id in the first structure is capable of being associated with at least one contiguous cache line in at least one group, the thread identified by the thread id having accessed the at least one contiguous cache line. Patterns are mined for in the entries to locate multiples of a same thread id that repeat for at least two groups. Threads identified by the located multiples of the same thread id are mapped to at least one native thread, and are scheduled on the same processor with other threads associated with the at least two groups.

    摘要翻译: 提供了一种用于在多处理器系统中调度线程的方法。 在第一个结构中,线程id被存储为与上下文切换相关联的线程。 每个线程id标识一个线程。 在第二个结构中,对于连续的高速缓存行的组存储条目。 每个条目被布置成使得第一结构中的线程ID能够与至少一个组中的至少一个连续的高速缓存行相关联,线程ID已经被访问了至少一个连续的高速缓存行。 在条目中挖掘模式以找到针对至少两个组重复的相同线程ID的倍数。 由相同线程ID的位置的多个标识的线程映射到至少一个本机线程,并且在与至少两个组相关联的其他线程的同一处理器上调度。

    Method and system for dynamically changing page types in unified scalable shared-memory architectures
    2.
    发明授权
    Method and system for dynamically changing page types in unified scalable shared-memory architectures 失效
    在统一的可扩展共享内存架构中动态更改页面类型的方法和系统

    公开(公告)号:US06360302B1

    公开(公告)日:2002-03-19

    申请号:US09435222

    申请日:1999-11-05

    IPC分类号: G06F1200

    摘要: According to one aspect of the invention, there is provided a method for dynamically changing page types in a unified scalable shared-memory architecture. The method includes the step of assigning a default page type of a given page as simple cache only memory architecture (SCOMA). Upon n memory references, a first parameter of the given page is calculated. A second parameter of the given page is calculated, when the first parameter is greater than a first threshold. The page type of the given page is dynamically changed to cache-coherent non-uniform memory architecture (ccNUMA), when the second parameter is greater than a second threshold. The first and the second parameters are one of a page reference probability and one minus a page utilization, the second parameter being different than the first parameter. According to another aspect of the invention, the n memory references correspond to all pages. According to yet another aspect of the invention, the n memory references correspond only to the given page.

    摘要翻译: 根据本发明的一个方面,提供了一种用于在统一的可扩展共享存储器架构中动态地改变页面类型的方法。 该方法包括将给定页面的默认页面类型分配为简单高速缓存存储器体系结构(SCOMA)的步骤。 在n个内存引用时,计算给定页面的第一个参数。 当第一个参数大于第一个阈值时,计算给定页面的第二个参数。 当第二个参数大于第二个阈值时,给定页面的页面类型会动态更改为高速缓存相干非均匀内存体系结构(ccNUMA)。 第一和第二参数是页面引用概率和一个减去页面利用率之一,第二个参数不同于第一个参数。 根据本发明的另一方面,n个存储器引用对应于所有页面。 根据本发明的另一方面,n个存储器引用仅对应给定页面。

    Cache coherence for lazy entry consistency in lockup-free caches
    3.
    发明授权
    Cache coherence for lazy entry consistency in lockup-free caches 失效
    缓存一致性,用于在无锁定缓存中延迟输入一致性

    公开(公告)号:US6094709A

    公开(公告)日:2000-07-25

    申请号:US886222

    申请日:1997-07-01

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0808 G06F12/0828

    摘要: A method of reducing false sharing in a shared memory system by enabling two caches to modify the same line at the same time. More specifically, with this invention a lock associated with a segment of shared memory is acquired, where the segment will then be used exclusively by processor of the shared memory system that has acquired the lock. For each line of the segment, an invalidation request is sent to a number of caches of the system. When a cache receives the invalidation request, it invalidates each line of the segment that is in the cache. When each line of the segment is invalidated, an invalidation acknowledgement is sent to the global directory. For each line of the segment that has been updated or modified, the update data is written back to main memory. Then, an acquire signal is sent to the requesting processor which then has exclusive use of the segment.

    摘要翻译: 一种通过使两个缓存同时修改同一行来减少共享存储器系统中的虚假共享的方法。 更具体地,利用本发明,获取与共享存储器的段相关联的锁,其中该段将由已经获得锁定的共享存储器系统的处理器专用。 对于段的每一行,将无效请求发送到系统的多个缓存。 当缓存接收到无效请求时,它会使缓存中的段的每一行无效。 当段的每一行无效时,将无效确认发送到全局目录。 对于已更新或修改的段的每一行,更新数据将被写回主存储器。 然后,获取信号被发送到请求处理器,然后该请求处理器具有该段的独占使用。

    Home node migration for distributed shared memory systems
    4.
    发明授权
    Home node migration for distributed shared memory systems 失效
    分布式共享内存系统的家庭节点迁移

    公开(公告)号:US5893922A

    公开(公告)日:1999-04-13

    申请号:US813814

    申请日:1997-03-06

    IPC分类号: G06F9/50 G06F12/08 G06F13/00

    CPC分类号: G06F9/5016 G06F12/0813

    摘要: A mechanism to dynamically migrate a home node of a global page to a more suitable node for improving performance of parallel applications running on a S-COMA and other DSM systems. More specifically, consultation counts are maintained at each client node of a shared memory system, where the consultation count indicates the number of times the client node has consulted the dynamic for lines a page. This information is then used along with other information to decide on whether to change the dynamic home node to a more suitable node.

    摘要翻译: 将全局页面的家庭节点动态迁移到更合适的节点以提高在S-COMA和其他DSM系统上运行的并行应用程序的性能的机制。 更具体地,在共享存储器系统的每个客户端节点处维护咨询计数,其中咨询计数指示客户端节点已经查阅了页面的行的动态次数。 然后将该信息与其他信息一起使用以决定是否将动态家庭节点更改为更合适的节点。

    Cache coherence protocol for reducing the effects of false sharing in
non-bus-based shared-memory multiprocessors
    5.
    发明授权
    Cache coherence protocol for reducing the effects of false sharing in non-bus-based shared-memory multiprocessors 失效
    缓存一致性协议,用于减少非基于总线的共享内存多处理器中的虚假共享的影响

    公开(公告)号:US5822763A

    公开(公告)日:1998-10-13

    申请号:US635071

    申请日:1996-04-19

    IPC分类号: G06F12/08 G06F12/00 G06F13/00

    CPC分类号: G06F12/0817

    摘要: A cache coherence protocol for a multiprocessor system. Each processor in the system has an associated cache capable of storing multiple word data lines. The system also includes a plurality of main memory modules, each having an associated distributed global directory storing directory information for lines stored in the associated main memory module. Each main memory module is connected to each processor by means of a multi-stage interconnection network. When a processor attempts to over-write an individual word in a line stored in its associated cache, a write request signal is sent to the appropriate global directory, and each other processor whose cache stores a copy of the line is notified of the request. When each other processor has responded with an acknowledgement, the first processor is allowed to proceed with the write.

    摘要翻译: 用于多处理器系统的缓存一致性协议。 系统中的每个处理器具有能够存储多个字数据线的相关联的高速缓存。 该系统还包括多个主存储器模块,每个主存储器模块具有存储在相关联的主存储器模块中的线路的目录信息的相关联的分布式全局目录。 每个主存储器模块通过多级互连网络连接到每个处理器。 当处理器尝试对存储在其相关联的高速缓存中的行中的单个字进行过写时,写入请求信号被发送到适当的全局目录,并且其高速缓存存储该行的副本的每个其他处理器被通知该请求。 当每个其他处理器响应确认时,允许第一处理器继续进行写入。

    Method for providing virtual atomicity in multi processor environment having access to multilevel caches
    6.
    发明授权
    Method for providing virtual atomicity in multi processor environment having access to multilevel caches 失效
    在具有访问多级缓存的多处理器环境中提供虚拟原子性的方法

    公开(公告)号:US06175899B1

    公开(公告)日:2001-01-16

    申请号:US08858135

    申请日:1997-05-19

    IPC分类号: G06F1200

    CPC分类号: G06F12/0811 G06F12/0808

    摘要: A method for assuring virtual atomic invalidation in a multilevel cache system wherein lower level cache locations store portions of a line stored at a higher level cache location. Upon receipt of an invalidation signal, the higher level cache location invalidates the line and places a HOLD bit on the invalidated line. Thereafter, the higher level cache sends invalidation signals to all lower level caches which store portions of the invalidated line. Each lower level cache invalidates its portion of the line and sets a HOLD bit on its portion of the line. The HOLD bits are reset after all line portion invalidations have been completed.

    摘要翻译: 一种用于在多级缓存系统中确保虚拟原子无效的方法,其中较低级高速缓存位置存储存储在较高级高速缓存位置的线的部分。 在接收到无效信号时,较高级高速缓存位置使线路无效,并将无效线路上的HOLD位置1。 此后,较高级别的缓存将无效信号发送到存储无效行的部分的所有低级缓存。 每个低级缓存使其部分行无效,并在该行的部分设置一个HOLD位。 所有线路部分无效之后,HOLD位被复位。

    Hierarchical bus simple COMA architecture for shared memory
multiprocessors having a bus directly interconnecting caches between
nodes
    7.
    发明授权
    Hierarchical bus simple COMA architecture for shared memory multiprocessors having a bus directly interconnecting caches between nodes 失效
    共享存储器多处理器的分层总线简单COMA架构,其具有总线直接互连节点之间的高速缓存

    公开(公告)号:US6148375A

    公开(公告)日:2000-11-14

    申请号:US023754

    申请日:1998-02-13

    CPC分类号: G06F12/0811 G06F12/0831

    摘要: A method of maintaining cache coherency in a shared memory multiprocessor system having a plurality of nodes, where each node itself is a shared memory multiprocessor. With this invention, an additional shared owner state is maintained so that if a cache at the highest level of cache memory in the system issues a read or write request to a cache line that misses the highest cache level of the system, then the owner of the cache line places the cache line on the bus interconnecting the highest level of cache memories.

    摘要翻译: 在具有多个节点的共享存储器多处理器系统中维持高速缓存一致性的方法,其中每个节点本身是共享存储器多处理器。 利用本发明,维护附加的共享所有者状态,使得如果系统中的高速缓冲存储器的最高级别的高速缓存向错过系统的最高高速缓存级别的高速缓存行发出读取或写入请求,则所有者 高速缓存行将高速缓存行放置在互连最高级别的高速缓冲存储器的总线上。

    Invalidation bus optimization for multiprocessors using directory-based
cache coherence protocols in which an address of a line to be modified
is placed on the invalidation bus simultaneously with sending a modify
request to the directory
    9.
    发明授权
    Invalidation bus optimization for multiprocessors using directory-based cache coherence protocols in which an address of a line to be modified is placed on the invalidation bus simultaneously with sending a modify request to the directory 失效
    对于使用基于目录的缓存一致性协议的多处理器的无效总线优化,其中要修改的行的地址同时发送到目录的修改请求到无效总线上

    公开(公告)号:US5778437A

    公开(公告)日:1998-07-07

    申请号:US533044

    申请日:1995-09-25

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0826 G06F12/0813

    摘要: An optimization scheme for a directory-based cache coherence protocol for multistage interconnection network-based multiprocessors improves system performance by reducing network latency. The optimization scheme is scalable, targeting multiprocessor systems having a moderate number of processors. The modification of shared data is the dominant contributor to performance degradation in these systems. The directory-based cache coherence scheme uses an invalidation bus on the processor side of the network. The invalidation bus connects all the private caches in the system and processes the invalidation requests thereby eliminating the need to send invalidations across the network. In operation, a processor which attempts to modify data places an address of the data to be modified on the invalidation bus simultaneously with sending a store request for the data modification to the global directory and the global directory sends to the processor attempting to modify the data, in addition to the permission signal, a count of the number of invalidation acknowledgments the processor should receive.

    摘要翻译: 用于基于多级互连网络的多处理器的基于目录的高速缓存一致性协议的优化方案通过减少网络延迟来提高系统性能。 优化方案是可扩展的,针对具有中等数量处理器的多处理器系统。 共享数据的修改是这些系统中性能下降的主要因素。 基于目录的高速缓存一致性方案在网络的处理器侧使用无效总线。 无效总线连接系统中的所有私有高速缓存,并处理无效请求,从而消除了通过网络发送无效的需要。 在操作中,尝试修改数据的处理器同时向无效化总线放置要修改的数据的地址,同时向全局目录发送用于数据修改的存储请求,并且全局目录发送到处理器尝试修改数据 除了许可信号之外,处理器应该接收到的无效确认次数的计数。

    Parallel network communications protocol using token passing
    10.
    发明授权
    Parallel network communications protocol using token passing 失效
    并行网络通信协议使用令牌传递

    公开(公告)号:US5742812A

    公开(公告)日:1998-04-21

    申请号:US520346

    申请日:1995-08-28

    摘要: A protocol for achieving atomic multicast in a parallel or distributed computing environment. The protocol guarantees concurrency atomicity with a maximum of m-1 message passes among the m server nodes of the system. Under one embodiment of the protocol, an access component message is transferred to the server nodes storing data to be accessed. The first server node of the plurality generates a token to be passed among the accessed nodes. A node can not process its request until it receives the token. A node may pass the token immediately upon ensuring that it is the current expected token.

    摘要翻译: 用于在并行或分布式计算环境中实现原子组播的协议。 该协议保证了系统的m个服务器节点间的最大m-1个消息传递的并发原子性。 在协议的一个实施例下,将访问组件消息传送到存储要访问的数据的服务器节点。 多个第一服务器节点生成要在被访问节点之间传递的令牌。 节点无法处理其请求,直到它接收到令牌。 节点可以在确保它是当前的预期令牌之后立即传递令牌。