摘要:
A Hardware Description Language (HDL) utilizing a Term Rewriting System (TRS) is provided that simplifies handling of clocks, and signaling between various clock domains of a multi-clock domain circuit specification. A specific clock data type is supplied for use with clock signals. Using the clock data type, and other requirements of a circuit specification, clock domain crossing between domains of clocks of the same clock family is handled implicitly. For clock domain crossing between clock domains driven by clocks of different clock families, a “hardware approach” and a “linguistic approach” are provided. A “hardware approach” provides facilities to explicitly specify a synchronizer, using, in part, TRS rules. A “linguistic approach” allows a designer to abstracts the instantiation of synchronizers and instead specify a circuit specification in terms of differently clocked interfaces.
摘要:
A multithreaded parallel data processing system has at least one processing element for processing multiple threads of computation. Threads are described by thread descriptors which are stored while waiting to be processed in a thread descriptor storage. Thread descriptors are comprised of an instruction pointer and a frame pointer. The instruction pointer points to the next instruction to be executed, and the frame pointer points to a frame of memory locations that the next instruction will operate on. Included within the instruction on set of the at least one processing element is a load instruction that loads global data into local processing element memory that is performed to two phases: a request phase and a response phase. Also included are instructions to fork a thread into two threads and to join two threads into a single thread.
摘要:
A Hardware Description Language (HDL) utilizing a Term Rewriting System (TRS) is provided that simplifies handling of clocks, and signaling between various clock domains of a multi-clock domain circuit specification. A specific clock data type is supplied for use with clock signals. Using the clock data type, and other requirements of a circuit specification, clock domain crossing between domains of clocks of the same clock family is handled implicitly. For clock domain crossing between clock domains driven by clocks of different clock families, a “hardware approach” and a “linguistic approach” are provided. A “hardware approach” provides facilities to explicitly specify a synchronizer, using, in part, TRS rules. A “linguistic approach” allows a designer to abstracts the instantiation of synchronizers and instead specify a circuit specification in terms of differently clocked interfaces.
摘要:
A Hardware Description Language (HDL) utilizing a Term Rewriting System (TRS) is provided that simplifies handling of clocks, and signaling between various clock domains of a multi-clock domain circuit specification. A specific clock data type is supplied for use with clock signals. Using the clock data type, and other requirements of a circuit specification, clock domain crossing between domains of clocks of the same clock family is handled implicitly. For clock domain crossing between clock domains driven by clocks of different clock families, a “hardware approach” and a “linguistic approach” are provided. A “hardware approach” provides facilities to explicitly specify a synchronizer, using, in part, TRS rules. A “linguistic approach” allows a designer to abstracts the instantiation of synchronizers and instead specify a circuit specification in terms of differently clocked interfaces.
摘要:
A Hardware Description Language (HDL) utilizing a Term Rewriting System (TRS) is provided that simplifies handling of clocks, and signaling between various clock domains of a multi-clock domain circuit specification. A specific clock data type is supplied for use with clock signals. Using the clock data type, and other requirements of a circuit specification, clock domain crossing between domains of clocks of the same clock family is handled implicitly. For clock domain crossing between clock domains driven by clocks of different clock families, a “hardware approach” and a “linguistic approach” are provided. A “hardware approach” provides facilities to explicitly specify a synchronizer, using, in part, TRS rules. A “linguistic approach” allows a designer to abstracts the instantiation of synchronizers and instead specify a circuit specification in terms of differently clocked interfaces.
摘要:
A multiprocessor system comprises a plurality of processing nodes, each node processing multiple threads of computation. Each node includes a data processor which sequentially processes blocks of code, each block defining a thread of computation. The code includes instructions to send start messages with data values to start new threads of computation. Each node also includes a synchronization coprocessor for processing start messages from the same and other nodes of the system. The coprocessor processes the messages from a message queue to store values as operands for threads of computation, to determine when all operands required for a thread of computation have been received and to provide in a continuation queue an indication to the data processor that a thread of computation may be initiated. The data processor subsequently nonsynchronously initiates the thread of computation. Alternatively, a single processor may perform the continuation and message processing functions in an interleaved sequence. The data processor creates messages to remote nodes using a global virtual address which is translated before transmission to a node designation and a local virtual address at the remote node.
摘要:
A system for learning and applying a task and data parallel strategy to an application that includes at least one task for processing an input data stream to produce an output data stream includes the following components. A controller measuring an execution of the application to generate an action space representing a task and data parallel strategy. A run-time system applying the action space to implement the task and data parallel strategy.
摘要:
A system for integrating task and data parallelism in a dynamic application that includes at least one task for processing an input data stream to produce an output data stream replaces the at least one task with the following components. A splitter task for partitioning the input data stream into a plurality of data chunks. A plurality of worker tasks for processing subsets of the data chunks, each worker task being an instance of the at least one task, and a joiner task combining the processed data chunks to produce the output data stream.
摘要:
In a computer system, a memory is allocated to a plurality of ports. The ports are arranged in a spatial ordering. A plurality of various sized data items are temporally ordered in each of the plurality of ports. Each data item includes a time-stamp to indicate the temporal ordering of the plurality of data items. The plurality of data items are atomically accessed by a plurality of threads using space and time coordinates. The space and time coordinates uniquely identify each of the plurality of data items.
摘要:
A multithreaded parallel data processing system has at least one processing element for processing multiple threads of computation. Threads are described by thread descriptors or tokens which are stored while waiting to be processed in a thread descriptor storage. Thread descriptors are comprised of an instruction pointer and a frame pointer. The instruction pointer points to the next instruction to be executed, and the frame pointer points to a frame of memory locations that the next instruction will operate on. Included within the instruction set of the at least one processing element is a fork instruction generates two thread descriptors which are added to the current thread descriptors, a start instruction on a first processor sends a message containing a thread descriptor to a second processor, and a join instruction joins two threads by producing a single thread descriptor when both of the joining threads have reached a join instruction.