Abstract:
An interconnect structure is disclosed comprising a collection of input ports, a collection of output ports, and a switching element. Data enters the switching element only at specific data entry times. The interconnect structure includes a collection of synchronizing elements. Data in the form of packets enter the input ports in an asynchronous fashion. The data packets pass from the input ports to the synchronizing units. The data exits the synchronizing units and enters the switching element with each packet arriving at the switching element at a specific data entry time.
Abstract:
An interconnection network has a first stage network and a second stage network and a collection of devices outside the network so that a first device is capable of sending data to a second device. The first stage network is connected to inputs of the second stage network. The first and second stage networks each have more outputs than inputs. The data is first sent from the first device to the first stage network and then from the first stage network to the second stage network. The data is sent to the second device from the second stage network. The number of inputs to a device w the collection of devices from the second stage network exceeds the number of outputs from device w into the first stage network. The device w with N p input ports is capable of simultaneously receiving data from N p devices in the collection of devices. The latency through the entire system may be a fixed constant.
Abstract:
Techniques are disclosed relating to performing Fast Fourier Transforms (FFTs) using distributed processing. In some embodiments, results of local transforms that are performed in parallel by networked processing nodes are scattered across processing nodes in the network and then aggregated. This may transpose the local transforms and store data in the correct placement for performing further local transforms to generate a final FFT result. The disclosed techniques may allow latency of the scattering and aggregating to be hidden behind processing time, in various embodiments, which may greatly reduce the time taken to perform FFT operations on large input data sets.
Abstract:
Techniques are disclosed relating to parallel computing. In some embodiments, fine-grained data communication facilitates operations on large data sets such as multiplication of a sparse matrix by a vector. In this example, a first data set (the matrix) and a second data set (the vector) are distributed across multiple processing nodes. Performance of the overall multiplication operation may require communication of data among the processing nodes. In various embodiments, fine-grained communication of this data may reduce processing times and/or power consumption by avoiding congestion.
Abstract:
An interconnect structure is disclosed comprising a collection of input ports, a collection of output ports, and a switching element. Data enters the switching element only at specific data entry times. The interconnect structure includes a collection of synchronizing elements. Data in the form of packets enter the input ports in an asynchronous fashion. The data packets pass from the input ports to the synchronizing units. The data exits the synchronizing units and enters the switching element with each packet arriving at the switching element at a specific data entry time.
Abstract:
This invention is directed to a parallel information generation, distribution and processing system (900). This scalable, pipelined control and switching system (900) efficiently and fairly manages a plurality of incoming data streams (132, 134), and applies class and quality of service requirements. The present invention also uses scalable MLML switch fabrics to control a data packet switch (930), including a request-processing switch (104) used to control the data-packet switch (930). Also included is a request processor (106) for each output port, which manages and approves all data flow to that output port, and an answer switch (108) which transmits answer packets from request processors (106) back to requesting input ports.
Abstract:
An interconnect structure comprises a plurality of network-connected devices and a logic (130) adapted to control a first subset os the network-connected devices (120) to transmit data and simultaneously control a second subset of the network-connected devices (140) to prepare for data transmission at a future time. The logic can execute an operation that actives a data transmission action upon realization of at least one predetermined criterion.
Abstract:
In a system, a memory controller separates a memory into multiple banks and enables a plurality of selected banks to be accessed concurrently. The memory controller further comprises a logic that creates a representation of a tree structure in memory and builds routing tables accessed by pointers at nodes in the tree memory structure, and a logic that finds a target memory address based on a received Internet Protocol (IP) address used by the tree memory structure and the routing table.
Abstract:
An interconnect structure (100) having a plurality of input ports and a plurality of output ports, including an input controller (150) which requests permission from predetermined logic within the structure to inject an entire message through two stages of data switches. The request contains only a portion of the address for a message target output with the amount of target output addresses supplied by the input controller (150) depending upon the data rate of the target output port.
Abstract:
An interconnect structure and method of communicating messages on the interconnect structure assists high priority messages to travel through the interconnect structure at a faster rate than normal or low priority messages. An interconnect structure includes a plurality of nodes with a plurality of interconnect lines selectively coupling the nodes in a hierarchical multiple-level structure. Data moves from an uppermost source level to a lowermost destination level. Nodes in the structure are arranged in columns and levels. Nodes in the structure are arranged in columns and levels. Data wormholes through the structure and, in a given time-step, data always moves from onee column to an adjacent column and while remaining on the same level or moving down to a lower level. When data moves down a level, an additional bit of the target output is fixed so data exiting from the bottom of the structure arrives at the proper target output port.