摘要:
Various load and store instructions may be used to transfer multiple vector elements between registers in a register file and memory. A cnt parameter may be used to indicate a total number of elements to be transferred to or from memory, and an rcnt parameter may be used to indicate a maximum number of vector elements that may be transferred to or from a single register within a register file. Also, the instructions may use a variety of different addressing modes. The memory element size may be specified independently from the register element size such that source and destination sizes may differ within an instruction. With some instructions, a vector stream may be initiated and conditionally enqueued or dequeued. Truncation or rounding fields may be provided such that source data elements may be truncated or rounded when transferred. Also, source data elements may be sign- or unsigned-extended when transferred.
摘要:
A re-configurable, streaming vector processor (100) is provided which includes a number of function units (102), each having one or more inputs for receiving data values and an output for providing a data value, a re-configurable interconnection switch (104) and a micro-sequencer (118). The re-configurable interconnection switch (104) includes one or more links, each link operable to couple an output of a function unit (102) to an input of a function unit (102) as directed by the micro-sequencer (118). The vector processor may also include one or more input-stream units (122) for retrieving data from memory. Each input-stream unit is directed by a host processor and has a defined interface (116) to the host processor. The vector processor also includes one or more output-stream units (124) for writing data to memory or to the host processor. The defined interface of the input-stream and output-stream units forms a first part of the programming model. The instructions stored in a memory, in the sequence that direct the re-configurable interconnection switch, form a second part of the programming model.
摘要:
A method for producing a formatted description of a computation representable by a data-flow graph and computer for performing a computation so described. A source instruction is generated for each input of the data-flow graph, a computational instruction is generated for each node of the data-flow graph, and a sink instruction is generated for each output of the data-flow graph. The computational instruction for a node includes a descriptor of an operation performed at the node and a descriptor of each instruction that produces an input to the node. The formatted description is a sequential instruction list comprising source instructions, computational instructions and sink instructions. Each instruction has an instruction identifier and the descriptor of each instruction that produces an input to the node is the instruction identifier. The computer is directed by a program of instructions to implement a computation representable by a data-flow graph.
摘要:
A method for scheduling a computation for execution on a computer with a number of interconnected functional units. The computation is representable by a data-flow graph with a number of nodes connected by edge. A loop-period of the computation is calculated and the nodes are scheduled for throughput by assigning an execution cycle and a functional unit to each node of the data-flow graph. The scheduling of flexible nodes is adjusted to minimize the number of interconnections required in each execution cycle. The edges of the data-flow graph are allocated to one or more of the interconnections between functional units. The scheduling method may be used, for example, to optimize the interconnection fabric design for an ASIC or as part of a compiler for a re-configurable streaming vector processor.
摘要:
An interconnection device (300) with a number of links (306, 308, 310, 312 and 314), each link having a number of link input ports (302), link output ports (304) and storage registers (316). An input selection switch (402) is coupled to a selected link input port to receive an input data token. The storage registers (316) may be used to store input data tokens. A storage access switch (404) is coupled to the input selection switch (402) and to the storage registers (316) and may be used to select the current input data token or a token from the storage registers as an output data token. An output selection switch (406) receives the output data token and provides it to a selected link output port. The interconnection device may, for example, be used to connect the inputs and outputs of the processing elements of a vector processor or digital signal processor.
摘要:
A method and apparatus for performing pipelined computations that include cross-iteration computations. The apparatus includes a functional unit having at least one input and an output, each input being operable to receive an input data value and an associated input data validity tag indicative of the validity of the input data value and the output being operable to provide an output data value and an associated output data validity tag indicative of the validity of the output data value. The first functional unit is operable in a first mode in which an output data value from the first functional unit is valid if all of the input data values are valid, and in a second mode in which the output data value from the first functional unit is valid if any of the input data values is valid.