Extended write combining using a write continuation hint flag
    1.
    发明授权
    Extended write combining using a write continuation hint flag 失效
    使用写入连续提示标志进行扩展写入组合

    公开(公告)号:US08458282B2

    公开(公告)日:2013-06-04

    申请号:US11768593

    申请日:2007-06-26

    摘要: A computing apparatus for reducing the amount of processing in a network computing system which includes a network system device of a receiving node for receiving electronic messages comprising data. The electronic messages are transmitted from a sending node. The network system device determines when more data of a specific electronic message is being transmitted. A memory device stores the electronic message data and communicating with the network system device. A memory subsystem communicates with the memory device. The memory subsystem stores a portion of the electronic message when more data of the specific message will be received, and the buffer combines the portion with later received data and moves the data to the memory device for accessible storage.

    摘要翻译: 一种用于减少网络计算系统中的处理量的计算装置,其包括用于接收包括数据的电子消息的接收节点的网络系统设备。 从发送节点发送电子消息。 网络系统设备确定何时正在发送特定电子消息的更多数据。 存储装置存储电子消息数据并与网络系统装置进行通信。 存储器子系统与存储器件通信。 当更多的特定消息的数据将被接收时,存储器子系统存储电子消息的一部分,并且缓冲器将该部分与稍后接收的数据组合,并将数据移动到存储器装置以进行存取。

    LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION
    2.
    发明申请
    LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION 失效
    低延迟存储器访问和同步

    公开(公告)号:US20070204112A1

    公开(公告)日:2007-08-30

    申请号:US11617276

    申请日:2006-12-28

    IPC分类号: G06F12/14

    摘要: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

    摘要翻译: 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单预取。 重新定义存储器线,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定要预取的存储器行而不是一些其它预测 算法。 这使得硬件能够有效地预取不连续但重复的存储器访问模式。

    EXTENDED WRITE COMBINING USING A WRITE CONTINUATION HINT FLAG
    3.
    发明申请
    EXTENDED WRITE COMBINING USING A WRITE CONTINUATION HINT FLAG 失效
    使用写持续提示标签扩展写入组合

    公开(公告)号:US20090006605A1

    公开(公告)日:2009-01-01

    申请号:US11768593

    申请日:2007-06-26

    IPC分类号: G06F17/30 G06F15/173

    摘要: A computing apparatus for reducing the amount of processing in a network computing system which includes a network system device of a receiving node for receiving electronic messages comprising data. The electronic messages are transmitted from a sending node. The network system device determines when more data of a specific electronic message is being transmitted. A memory device stores the electronic message data and communicating with the network system device. A memory subsystem communicates with the memory device. The memory subsystem stores a portion of the electronic message when more data of the specific message will be received, and the buffer combines the portion with later received data and moves the data to the memory device for accessible storage.

    摘要翻译: 一种用于减少网络计算系统中的处理量的计算装置,其包括用于接收包括数据的电子消息的接收节点的网络系统设备。 从发送节点发送电子消息。 网络系统设备确定何时正在发送特定电子消息的更多数据。 存储装置存储电子消息数据并与网络系统装置进行通信。 存储器子系统与存储器件通信。 当更多的特定消息的数据将被接收时,存储器子系统存储电子消息的一部分,并且缓冲器将该部分与稍后接收的数据组合,并将数据移动到存储器装置以进行存取。

    Novel snoop filter for filtering snoop requests
    4.
    发明申请
    Novel snoop filter for filtering snoop requests 有权
    用于过滤窥探请求的新型窥探过滤器

    公开(公告)号:US20060224838A1

    公开(公告)日:2006-10-05

    申请号:US11093152

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.

    摘要翻译: 一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置,每个处理单元具有与其相关联并与其可操作地相连的一个或多个本地高速缓冲存储器。 该方法包括提供与每个处理单元相关联的窥探过滤器设备,每个窥探过滤器设备具有多个专用输入端口,用于从多处理器计算环境中的专用存储器写入源接收窥探请求。 每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器,每个端口窥探滤波器实现一个或多个并行操作子滤波器元件,其适于同时滤除从相应专用存储器接收的窥探请求 写入源并将这些请求的子集转发到其相关联的处理单元。

    Snoop filtering system in a multiprocessor system
    5.
    发明申请
    Snoop filtering system in a multiprocessor system 有权
    多处理器系统中的Snoop过滤系统

    公开(公告)号:US20060224835A1

    公开(公告)日:2006-10-05

    申请号:US11093127

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.

    摘要翻译: 一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法,每个单元具有与其可操作耦合的相关联的高速缓存存储器系统 该系统包括多个互连的窥探过滤器单元,每个窥探过滤器单元对应于相应处理单元并与其通信,每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备,用于处理在其上接收的窥探请求,并且转发请求之一或者阻止将请求转发到其相关联的处理单元。 多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件,每个并行操作子滤波器元件同时接收相同的窥探请求,并且实现一个或多个不同的窥探滤波器算法,用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求 并且防止将这些请求转发到处理器单元。 以这种方式,减少了转发到处理单元的多个窥探请求,从而增加了计算环境的性能。

    METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP
    7.
    发明申请
    METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP 失效
    有效跟踪与TIMESTAMP相关的队列的方法和设备

    公开(公告)号:US20090006672A1

    公开(公告)日:2009-01-01

    申请号:US11768800

    申请日:2007-06-26

    IPC分类号: G06F3/00 G06F1/04

    CPC分类号: G06F12/0835 G06F12/0831

    摘要: An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

    摘要翻译: 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。 该装置包括相干逻辑单元,每个单元具有多个队列结构,每个队列结构与在系统中传输的事件信号的相应发送者相关联。 与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队,并且计数器跟踪队列结构中剩余入队的多个相干事件信号,并且从接收到时间戳信号起出队。 计数器机构产生一个输出信号,指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。 在一个实施例中,时间戳信号在存储器同步操作的开始被断言,并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。 然后可以将该信号用作存储器同步操作的完成条件的一部分。

    Method and apparatus for efficiently tracking queue entries relative to a timestamp
    8.
    发明授权
    Method and apparatus for efficiently tracking queue entries relative to a timestamp 失效
    相对于时间戳有效跟踪队列条目的方法和装置

    公开(公告)号:US08756350B2

    公开(公告)日:2014-06-17

    申请号:US11768800

    申请日:2007-06-26

    IPC分类号: G06F3/00 G06F5/00

    CPC分类号: G06F12/0835 G06F12/0831

    摘要: An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.

    摘要翻译: 一种用于跟踪在多处理器系统中发送的相干事件信号的装置和方法。 该装置包括相干逻辑单元,每个单元具有多个队列结构,每个队列结构与在系统中传输的事件信号的相应发送者相关联。 与队列结构相关联的定时电路控制接收的相干事件信号的排队和出队,并且计数器跟踪队列结构中剩余入队的多个相干事件信号,并且从接收到时间戳信号起出队。 计数器机构产生一个输出信号,指示在接收时间戳信号时存在于队列结构中的所有相干事件信号已经出队。 在一个实施例中,时间戳信号在存储器同步操作的开始被断言,并且输出信号指示当时间戳信号被断言时存在的所有相干事件已经完成。 然后可以将该信号用作存储器同步操作的完成条件的一部分。

    MULTIPLE NODE REMOTE MESSAGING
    9.
    发明申请
    MULTIPLE NODE REMOTE MESSAGING 有权
    多个节点远程消息传递

    公开(公告)号:US20090006546A1

    公开(公告)日:2009-01-01

    申请号:US11768784

    申请日:2007-06-26

    IPC分类号: G06F15/16

    CPC分类号: G06F15/16

    摘要: A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

    摘要翻译: 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括:第一计算节点(A)将单个远程消息发送到远程第二计算节点(B),以便控制远程第二计算 节点(B)发送至少一个远程消息。 该方法包括各种步骤,包括在第一计算节点(A)处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符,用于控制远程第二计算节点(B)至少发送 一个远程消息,包括将第一消息描述符放在第一计算节点(A)的注入FIFO中,并将单个远程消息和至少一个远程消息描述符发送到第二计算节点(B)。

    Multiple node remote messaging
    10.
    发明授权
    Multiple node remote messaging 有权
    多节点远程消息传递

    公开(公告)号:US07788334B2

    公开(公告)日:2010-08-31

    申请号:US11768784

    申请日:2007-06-26

    IPC分类号: G06F15/167 G06F13/28

    CPC分类号: G06F15/16

    摘要: A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).

    摘要翻译: 在形成为互连的计算节点的网络的并行计算机系统中传递远程消息的方法包括:第一计算节点(A)将单个远程消息发送到远程第二计算节点(B),以便控制远程第二计算 节点(B)发送至少一个远程消息。 该方法包括各种步骤,包括在第一计算节点(A)处控制DMA引擎以准备单个远程消息以包括第一消息描述符和至少一个远程消息描述符,用于控制远程第二计算节点(B)至少发送 一个远程消息,包括将第一消息描述符放在第一计算节点(A)的注入FIFO中,并将单个远程消息和至少一个远程消息描述符发送到第二计算节点(B)。