摘要:
A method and apparatus including a cache controller coupled to a cache memory, wherein the cache controller receives a plurality of cache access requests, performs a pre-sorting of the plurality of cache access requests by a first stage of the cache controller to order the plurality of cache access requests, wherein the first stage functions by performing a presorting and pre-clustering process on the plurality of cache access requests in parallel to map the plurality of cache access requests from a first position to a second position corresponding to ports or banks of a cache memory, performs the combining and splitting of the plurality of cache access request by a second stage of the cache controller, and applies the plurality of cache access requests to the cache memory at line speed.
摘要:
An apparatus for localized and random data access is described herein. The apparatus includes a multi-bank memory, a queue, and an output buffer. The multi-bank memory is to store addresses locations of imaging data. The queue corresponds to each bank of the multi-bank memory, and the queue is to store addresses from the multi-bank memory for data access. The output buffer is to store data accessed based on addresses from the queue.
摘要:
Systems and methods are provided for providing scatter/gather data processing. In accordance with an embodiment, a such a system can include a cluster of one or more high performance computing systems, each including one or more processors and a high performance memory. The cluster communicates over an InfiniBand network. The system can also include a middleware environment, executing on the cluster, that includes one or more application server instances. The system can further include a plurality of muxers. Each application server instance includes at least one muxer, and each muxer is operable to collect data from a plurality of locations in the high performance memory, and transfer the data in bulk.
摘要:
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.
摘要:
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.
摘要:
A system and method can support in-memory session replication in a server cluster using a lazy deserialization approach. The server cluster can include a primary application server and a secondary application server. The primary application server operates to receive a request associated with a session from a client and maintains session information associated with the session. Based on the session information, the primary application server can responds to the client. The secondary application server operates to receive and maintain serialized session information from the primary application server. The secondary application server operates to update the serialized session information based on one or more session updates received from the primary application server. When the primary application server fails, the secondary application server can generate deserialized session information based on the updated serialized session information and responds to the client.
摘要:
A method for line speed interconnect processing. The method includes receiving initial inputs from an input communications path, performing a pre-sorting of the initial inputs by using a first stage interconnect parallel processor to create intermediate inputs, and performing the final combining and splitting of the intermediate inputs by using a second stage interconnect parallel processor to create resulting outputs. The method further includes transmitting the resulting outputs out of the second stage at line speed.
摘要:
A calibratable communications link includes multiple parallel lines. Calibration is performed at dynamically variable and/or interruptible intervals determined by an automated mechanism. Calibration is preferably initiated responsive to a command generated by an executable software process, which initiates calibration responsive to detection of a probable impending need as indicated by, e.g., temperature change, calibrated parameter drift, error rate, etc. Calibration is also preferably initiated according to probable minimal disruption of device function, as indicated by low activity level. Furthermore, in one aspect calibration may be temporarily suspended to transmit data and then resumed.
摘要:
In a parallel computer, performing a rooted-v collective operation by an operational group of compute nodes includes: identifying, in source code by a collective algorithm selection optimizing module, a gather operation followed by a rooted-v collective operation; replacing, by the collective algorithm selection optimizing module, the gather operation with an allgather operation; executing, by the compute nodes, the allgather operation; selecting, by each compute node in dependence upon results of the allgather operation, an algorithm for effecting the rooted-v collective operation; and executing, by each compute node, the rooted-v collective operation with the selected algorithm.
摘要:
In a parallel computer, a largest logical plane from a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: identifying, by each compute node of the subcommunicator, all logical planes that include the compute node; calculating, by each compute node for each identified logical plane that includes the compute node, an area of the identified logical plane; initiating, by a root node of the subcommunicator, a gather operation; receiving, by the root node from each compute node of the subcommunicator, each node's calculated areas as contribution data to the gather operation; and identifying, by the root node in dependence upon the received calculated areas, a logical plane of the subcommunicator having the greatest area.