摘要:
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
摘要:
A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
摘要:
A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
摘要:
Apparatus, systems, and methods disclosed herein may download one or more sets of presentation rendering property values (PRPVs) to a client processing device. Each set of PRPVs may be associated with a set of presentation characteristics of a sensory output device (SOD) used to present a multimedia presentation, a state of the SOD, or a state of a filter used to filter the presentation. A subset of PRPVs may be associated with a presentation element. A presentation page containing the presentation element may be assembled at a multi-rendered presentation server from a template, a set of controls, a view, and a dataset. The page may be assembled such that the set of PRPVs, acting upon the presentation element at the client processing device, is capable of rendering the presentation element for presentation at the SOD with a predetermined set of perceptual characteristics compatible with SOD presentation characteristics.
摘要:
Identified herein are sperm ligand proteins located in the membrane of sperm, which proteins interact with the membrane of oocytes. Methods of using these proteins, or fragments or derivatives or analogs thereof, are also described. These include methods of increasing (or reducing) successful fertilization, for instance through improved sperm-oocyte binding, fusion or activation (or the blocking thereof); methods of preventing fertilization of an oocyte, for instance by inducing an immune response to at least one sperm ligand that promotes sperm-oocyte binding, sperm-oocyte fusion or oocyte activation; and methods for enhancing assisted reproductive technologies, for instance through stimulation of activation with nuclear transfer, stimulation of inactive or weak sperm, and so forth.
摘要:
A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each of the memory writing sources is directly connected to the dedicated input ports of all other snoop filter devices associated with all other processing units in a point-to-point interconnect fashion. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要:
A technique for solving a set of wave equations in a region uses points arranged in a grid spanning the region or coefficients of wave expansion for objects located in the region. The grid points or the coefficients are partitioned into blocks on multiple levels, and block impedance matrices encoding the wave equations is derived for pairs of blocks. The block impedance matrix need not be calculated as it is written as the product of two non-square matrices, denoted U and V. Each of U and V have one linear dimension which is only of the order of the rank of the block impedance matrix levels. The rank is predetermined by coarse sampling. Two examples of the use of the invention are given: solving surface integral equations and Foldy Lax equations for partial waves.
摘要:
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
摘要:
A fault isolation technique for checking the accuracy of data packets transmitted between nodes of a parallel processor. An independent crc is kept of all data sent from one processor to another, and received from one processor to another. At the end of each checkpoint, the crcs are compared. If they do not match, there was an error. The crcs may be cleared and restarted at each checkpoint. In the preferred embodiment, the basic functionality is to calculate a CRC of all packet data that has been successfully transmitted across a given link. This CRC is done on both ends of the link, thereby allowing an independent check on all data believed to have been correctly transmitted. Preferably, all links have this CRC coverage, and the CRC used in this link level check is different from that used in the packet transfer protocol. This independent check, if successfully passed, virtually eliminates the possibility that any data errors were missed during the previous transfer period.
摘要:
Disclosed are an error recovery method and system for use with a communication system having first and second nodes, each of said nodes having a receiver and a sender, the sender of the first node being connected to the receiver of the second node by a first cable, and the sender of the second node being connected to the receiver of the first node by a second cable. The method comprising the step of after one of the nodes detects an error, both of the nodes entering the same defined state. In particular, the receiver of the first node enters an error state, stays in the error state for a defined period of time T, and, after said defined period of time T, enters a wait state. Also, the sender of the first node sends to the receiver of the second node an error message for a defined period of time Te, and after the defined period of time Te, the sender of the first node enters an idle state.