摘要:
A processor element, structured to execute a 32-bit fixed length instruction set architecture, is backward compatible with a 16-bit fixed length instruction set architecture by translating each of the 16-bit instructions into a sequence of one or more 32-bit instructions. Switching between 16-bit instruction execution and 32-bit instruction execution is accomplished by branch instructions that employ a least significant bit position of the address of the target of the branch to identify whether the target instruction is a 16-bit instruction or a 32-bit instruction.
摘要:
System, method, and apparatus for improving the performance of collective operations in High Performance Computing (HPC). Compute nodes in a networked HPC environment form collective groups to perform collective operations. A spanning tree is formed including the compute nodes and switches and links used to interconnect the compute nodes, wherein the spanning tree is configured such that there is only a single route between any pair of nodes in the tree. The compute nodes implement processes for performing the collective operations, which includes exchanging messages between processes executing on other compute nodes, wherein the messages contain indicia identifying collective operations they belong to. Each switch is configured to implement message forwarding operations for its portion of the spanning tree. Each of the nodes in the spanning tree implements a ratcheted cyclical state machine that is used for synchronizing collective operations, along with status messages that are exchanged between nodes. Transaction IDs are also used to detect out-of-order and lost messages.
摘要:
Methods of managing a cache memory system in a data processing system are disclosed. The data processing system executes instructions and stores and receives data from a memory having locations in a memory space. The entries of the cache memory are in locations in a register space separate from the memory space. A first instruction that operates only on locations in a register space but not on locations in memory space may be executed to obtain address information from at least one entry of the cache memory. The obtained address information be compared with target address information. If the comparison between the obtained address information and the target address information results in a correspondence, then a first operation may be performed on the entry of the cache memory. If the comparison between the obtained address information and the target address information does not result in a correspondence, then the fit first operations not performed on the entry of the cache memory. Management operations may thus be performed on the cache memory without using locations in memory space. The first operation may include invalidate, flush or purge operations. The cache memory may be a virtual cache memory that has a plurality of entries each including physical address information and logical address information. The obtained address information, may be logical address information or physical address information. The first instruction may be a GET instruction for reading information from entries of the translation lookaside buffer or the cache memory. The second instruction may be a PUT instruction for writing information to entries of the translation lookaside buffer or the cache memory.
摘要:
Methods and apparatus for sending packets using optimized PIO write sequences without sfences and out-of-order credit returns. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received by a processor in an original order and executed out of order, resulting in the packet data being written to send blocks in the PIO send memory out of order, while the packets themselves are stored in sequential order once all of the packet data is written. The packets are egressed out of order by egressing packet data contained in the send blocks to an egress block using a non-sequential packet order that is different than the sequential packet order. In conjunction with egressing the packets, corresponding credits are returned in the non-sequential packet order. A block list comprising a linked list and a free list are used to facilitate out-of-order packet egress and corresponding out-of-order credit returns.
摘要:
A method and apparatus for improved performance for reloading translation look-aside buffers in multithreading, multi-core processors. TSB prediction is accomplished by hashing a plurality of data parameters and generating an index that is provided as an input to a predictor array to predict the TSB page size. In one embodiment of the invention, the predictor array comprises two-bit saturating up-down counters that are used to enhance the accuracy of the TSB prediction. The saturating up-down counters are configured to avoid making rapid changes in the TSB prediction upon detection of an error. Multiple misses occur before the prediction output is changed. The page size specified by the predictor index is searched first. Using the technique described herein, errors are minimized because the counter leads to the correct result at least half the time.
摘要:
In an embodiment, an out-of-order, reliable, end-to-end protocol is provided that can enable direct user-level data placement and atomic operations between nodes of a multi-node network. The protocol may be optimized for low-loss environments such as High Performance Computing (HPC) applications, and may enable loss detection and de-duplication of packets through the use of a robust window state manager at a target node. A multi-node network implementing the protocol may have increased system reliability, packet throughput, and increased tolerance for adaptively routed traffic, while still allowing atomic operations to be idempotently applied directly to a user memory location.
摘要:
System, method, and apparatus for improving the performance of collective operations in High Performance Computing (HPC). Compute nodes in a networked HPC environment form collective groups to perform collective operations. A spanning tree is formed including the compute nodes and switches and links used to interconnect the compute nodes, wherein the spanning tree is configured such that there is only a single route between any pair of nodes in the tree. The compute nodes implement processes for performing the collective operations, which includes exchanging messages between processes executing on other compute nodes, wherein the messages contain indicia identifying collective operations they belong to. Each switch is configured to implement message forwarding operations for its portion of the spanning tree. Each of the nodes in the spanning tree implements a ratcheted cyclical state machine that is used for synchronizing collective operations, along with status messages that are exchanged between nodes. Transaction IDs are also used to detect out-of-order and lost messages.
摘要:
Methods of widening the permission for a memory access in a data processing system having a virtual cache memory and a translation lookaside buffer are disclosed. A memory access operation is initiated on a predetermined memory location based on logical address information and permission information associated with the memory access operation. The virtual cache memory is accessed and a determination may be made if there is a match between logical address information of the memory access operation and logical address information stored in the entries of the virtual cache. In the event of a match, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the virtual cache memory as to whether the memory access operation is permitted. If the memory access operation is not permitted by the permission information of the particular entry of the virtual cache memory, then the translation lookaside buffer may be accessed based on the logical address information of the particular entry of the virtual cache memory. If there is a match between the logical address information of the particular entry of the virtual cache memory and the logical address information of a particular entry of the translation lookaside buffer, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the translation lookaside buffer as to whether the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer. If the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer, then the permission information of the particular entry of the virtual cache memory may be updated based on the permission information of the particular entry of the translation lookaside buffer and the memory access operation may be completed.
摘要:
Methods of widening the permission for a memory access in a data processing system having a virtual cache memory and a translation lookaside buffer are disclosed. A memory access operation is initiated on a predetermined memory location based on logical address information and permission information associated with the memory access operation. The virtual cache memory is accessed and a determination may be made if there is a match between logical address information of the memory access operation and logical address information stored in the entries of the virtual cache. In the event of a match, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the virtual cache memory as to whether the memory access operation is permitted. If the memory access operation is not permitted by the permission information of the particular entry of the virtual cache memory, then the translation lookaside buffer may be accessed based on the logical address information of the particular entry of the virtual cache memory. If there is a match between the logical address information of the particular entry of the virtual cache memory and the logical address information of a particular entry of the translation lookaside buffer, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the translation lookaside buffer as to whether the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer. If the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer, then the permission information of the particular entry of the virtual cache memory may be updated based on the permission information of the particular entry of the translation lookaside buffer and the memory access operation may be completed.