摘要:
While an asynchronous memory move (AMM) operation is ongoing, a prefetch request for data from the source effective address or the destination effective address triggers cache injection by the AMM mover of relevant data from the stream of data being moved in the physical memory. The memory controller forwards the first prefetched line to the prefetch engine and L1 cache, the next cache lines in the sequence of data to the L2 cache, and a subsequent set of cache lines to the L3 cache. The memory controller then forwards the remaining data to the destination memory location. Quick access to prefetch data is enabled by buffering the stream of data in the upper caches rather than placing all the moved data within the memory. Also, the memory controller places moved data into only a subset of the available cache lines of the upper level cache.
摘要:
A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
摘要:
A mechanism is provided for transmitting data in a data network. A first processor of the data network receives data to be transmitted to a second processor within the data network. A determination is made if the data has previously been routed through an indirect communication link from a source processor, the indirect communication link being a communication link that does not directly couple the source processor to a final destination processor which is to receive the data. A communication link is selected over which to transmit the data from the first processor to the second processor based on results of determining if the data has previously been routed through an indirect communication link. Finally, the data is transmitted from the first processor to the second processor using the selected communication link.
摘要:
A distributed data processing system executes multiple tasks within a parallel job, including a first local task on a local node and at least one task executing on a remote node, with a remote memory having real address (RA) locations mapped to one or more of the source effective addresses (EA) and destination EA of a data move operation initiated by a task executing on the local node. On initiation of the data move operation, remote asynchronous data move (RADM) logic identifies that the operation moves data to/from a first EA that is memory mapped to an RA of the remote memory. The local processor/RADM logic initiates a RADM operation that moves a copy of the data directly from/to the first remote memory by completing the RADM operation using the network interface cards (NICs) of the source and destination processing nodes, determined by accessing a data center for the node IDs of remote memory.
摘要:
A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.
摘要:
A mechanism is provided for providing reliability of communication. A first processor determines a current state of links coupled to ports of a first processor of the data processing system. Each port of the first processor comprises a plurality of links to a corresponding port on a second processor of the data processing system. The current state of the links indicates a level of error associated with each link. The first processor determines, for each link, if a level of error associated with the link exceeds a threshold. For each link whose level of error exceeds the threshold, the first processor tags the link with an error identifier in a switch associated with the ports of the first processor. The first processor reduces a level of usage for transmitting data on ports associated with links tagged with the error identifier.
摘要:
A wake-and-go mechanism is provided for a data processing system. When a thread is waiting for an event, rather than performing a series of get-and-compare sequences, the thread updates a wake-and-go array with a target address associated with the event. Software may save the state of the thread. The thread is then put to sleep. When the wake-and-go array snoops a kill at a given target address, logic associated with wake-and-go array may generate an exception, which may result in a switch to kernel mode, wherein the operating system performs some action before returning control to the originating process. In this case, the trap results in other software, such as the operating system or background sleeper thread, for example, to reload thread from thread state storage and to continue processing of the active threads on the processor.
摘要:
A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. If executing a portion of the set of helper thread binaries results in the retrieval of data needed by the set of main thread binaries, then that retrieved data is utilized by the set of main thread binaries.
摘要:
A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model puts special purpose processing elements on the same playing field as processors, from a programming perspective, operating system perspective, power perspective, as the processors. The operating system can get work to a security engine, for example, in the same way it does to a processor.
摘要:
A method within a data processing system by which a processor executes an asynchronous memory move (AMM) store (ST) instruction to complete a corresponding AMM operation in parallel with an ongoing (not yet completed), previously issued barrier operation. The processor receives the AMM ST instruction after executing the barrier operation (or SYNC instruction) and before the completion of the barrier operation or SYNC on the system fabric. The processor continues executing the AMM ST instruction, which performs a move in virtual address space and then triggers the generation of the AMM operation. The AMM operation proceeds while the barrier operation continues, independent of the processor. The processor stops further execution of all other memory access requests, excluding AMM ST instructions that are received after the barrier operation, but before completion of the barrier operation.