摘要:
A method and apparatus for the elimination of prolog and epilog instructions in a vector processor. To eliminate the prolog, a functional unit of the vector processor has at least one input for receiving an input data value tagged with a data validity tag and an output for outputting an intermediate result tagged with a data validity tag. The data validity tags indicate the validity of the data. Before a loop is executed, the data validity tags are set to indicate that the associated data values are invalid. During execution of the loop body a functional unit checks the validity of input data. If all of the input data values are valid the functional operation is performed, the corresponding data validity tag set to indicate that the result is valid. If any of the input data values is invalid, the data validity tag of the result is set to indicate that the result is invalid. To eliminate the epilog, an iteration counter is associated with each sink unit of the vector processor. When a specified number of data values have been produced by a particular sink, no more data values are produced by that sink. The instructions for the pipelined loop body may be repeated, without alteration, to eliminate prolog and epilog instructions.
摘要:
Various load and store instructions may be used to transfer multiple vector elements between registers in a register file and memory. A cnt parameter may be used to indicate a total number of elements to be transferred to or from memory, and an rcnt parameter may be used to indicate a maximum number of vector elements that may be transferred to or from a single register within a register file. Also, the instructions may use a variety of different addressing modes. The memory element size may be specified independently from the register element size such that source and destination sizes may differ within an instruction. With some instructions, a vector stream may be initiated and conditionally enqueued or dequeued. Truncation or rounding fields may be provided such that source data elements may be truncated or rounded when transferred. Also, source data elements may be sign- or unsigned-extended when transferred.
摘要:
A method for producing a formatted description of a computation representable by a data-flow graph and computer for performing a computation so described. A source instruction is generated for each input of the data-flow graph, a computational instruction is generated for each node of the data-flow graph, and a sink instruction is generated for each output of the data-flow graph. The computational instruction for a node includes a descriptor of an operation performed at the node and a descriptor of each instruction that produces an input to the node. The formatted description is a sequential instruction list comprising source instructions, computational instructions and sink instructions. Each instruction has an instruction identifier and the descriptor of each instruction that produces an input to the node is the instruction identifier. The computer is directed by a program of instructions to implement a computation representable by a data-flow graph.
摘要:
A re-configurable, streaming vector processor (100) is provided which includes a number of function units (102), each having one or more inputs for receiving data values and an output for providing a data value, a re-configurable interconnection switch (104) and a micro-sequencer (118). The re-configurable interconnection switch (104) includes one or more links, each link operable to couple an output of a function unit (102) to an input of a function unit (102) as directed by the micro-sequencer (118). The vector processor may also include one or more input-stream units (122) for retrieving data from memory. Each input-stream unit is directed by a host processor and has a defined interface (116) to the host processor. The vector processor also includes one or more output-stream units (124) for writing data to memory or to the host processor. The defined interface of the input-stream and output-stream units forms a first part of the programming model. The instructions stored in a memory, in the sequence that direct the re-configurable interconnection switch, form a second part of the programming model.
摘要:
A method for scheduling a computation for execution on a computer with a number of interconnected functional units. The computation is representable by a data-flow graph with a number of nodes connected by edge. A loop-period of the computation is calculated and the nodes are scheduled for throughput by assigning an execution cycle and a functional unit to each node of the data-flow graph. The scheduling of flexible nodes is adjusted to minimize the number of interconnections required in each execution cycle. The edges of the data-flow graph are allocated to one or more of the interconnections between functional units. The scheduling method may be used, for example, to optimize the interconnection fabric design for an ASIC or as part of a compiler for a re-configurable streaming vector processor.
摘要:
An interconnection device (300) with a number of links (306, 308, 310, 312 and 314), each link having a number of link input ports (302), link output ports (304) and storage registers (316). An input selection switch (402) is coupled to a selected link input port to receive an input data token. The storage registers (316) may be used to store input data tokens. A storage access switch (404) is coupled to the input selection switch (402) and to the storage registers (316) and may be used to select the current input data token or a token from the storage registers as an output data token. An output selection switch (406) receives the output data token and provides it to a selected link output port. The interconnection device may, for example, be used to connect the inputs and outputs of the processing elements of a vector processor or digital signal processor.
摘要:
A memory interface device (100) providing a fractional address interface between a data processor (104) and a memory system (102) and a method for retrieving intermediate data values from a memory system using fractional addressing. The device includes an address generator (108) for generating first and second memory addresses, the first memory address being less than or equal to a specified fractional address, the second memory address being greater than or equal to the fractional address. The device also includes a memory access unit (110) coupled to the address generator (108) for retrieving first and second data values from the memory system (102) at the first and second memory addresses, respectively. The device also includes a data access unit (112) for interpolating between the first and second data values and passing the interpolated value to the data processor (104). The memory interface has application in a variety of data processing systems, including digital signal processors and streaming vector processors.
摘要:
A cache for storing data elements is disclosed. The cache includes a cache memory having one or more lines and one or more cache line counters, each associated with a line of the cache memory. In operation, a cache line counter of the one or more of cache line counters is incremented when a request is received to prefetch a data element into the cache memory and is decremented when the data element is consumed. Optionally, one or more reference queues may be used to store the locations of data elements in the cache memory. In one embodiment, data cannot be evicted from cache lines unless the associated cache line counters indicate that the prefetched data has been consumed.
摘要:
A toy or plaything in the form of a puzzle comprises a first member, and a second member constituting a housing within which the first member is rotatably mounted. The second member is such that access is provided to the first member to enable manual rotation thereof. A peg is mounted on one of said members and a maze is defined in a surface of the other of the first and second members, the peg engaging the maze such that upon selectively rotating the first member within the second member the peg is caused to move through the maze. There is further provided means to indicate when the peg is at a predetermined destination in the maze.
摘要:
A system includes a first gas sensor [110] to detect a first concentration of a predetermined gas and to determine a first rate of change in the first concentration over a time interval. A second gas sensor [115] detects a second concentration of the predetermined gas and determines a second rate of change in the second concentration over the time interval. A third gas sensor [120] detects a third concentration of the predetermined gas and determines a third rate of change in the third concentration over the time interval. The first, second, and third gas sensors each have a known location. At least one processing device [510] (a) determines respective distances between a gas leak location and the respective locations of the gas sensors based on the detected rates of change, and (b) calculates a location of the gas leak based on a triangulation of the first distance, the second distance, and the third distance.