摘要:
Various load and store instructions may be used to transfer multiple vector elements between registers in a register file and memory. A cnt parameter may be used to indicate a total number of elements to be transferred to or from memory, and an rcnt parameter may be used to indicate a maximum number of vector elements that may be transferred to or from a single register within a register file. Also, the instructions may use a variety of different addressing modes. The memory element size may be specified independently from the register element size such that source and destination sizes may differ within an instruction. With some instructions, a vector stream may be initiated and conditionally enqueued or dequeued. Truncation or rounding fields may be provided such that source data elements may be truncated or rounded when transferred. Also, source data elements may be sign- or unsigned- extended when transferred.
摘要:
A cache for storing data elements is disclosed. The cache includes a cache memory having one or more lines and one or more cache line counters, each associated with a line of the cache memory. In operation, a cache line counter of the one or more of cache line counters is incremented when a request is received to prefetch a data element into the cache memory and is decremented when the data element is consumed. Optionally, one or more reference queues may be used to store the locations of data elements in the cache memory. In one embodiment, data cannot be evicted from cache lines unless the associated cache line counters indicate that the prefetched data has been consumed.
摘要:
A bus filter and filtering method for translating between virtual and physical memory addresses. The bus filter may be used to couple a processing device, such as an accelerator, to a system having a core processor and an external memory unit coupled by a bus. The bus filter includes a first bus interface connected to the system bus for receiving a virtual memory address and a second interface connected to the system bus for transmitting a physical memory address. An address translation unit, such as a translation lookaside buffer, determines the physical memory address from the virtual memory address.
摘要:
A method for generating a sequence of memory addresses for a multi-dimensional data structure and an address generation unit are disclosed. The address generation unit includes an ADDRESS register, a STRIDE register, and a plurality skip generators, each having SKIP, SPAN and COUNT registers. An address value is initialized to a first address and each COUNT register is initialized. For each address of the sequence an address value is output and a stride value is added to the address value. For each dimension of the data structure the COUNT register associated with the dimension is updated as each address is generated. For all dimensions, when the COUNT register value becomes zero, the skip value associated with the dimension is added to the address value and its COUNT register is reset to a specified value.
摘要:
A method and apparatus for performing pipelined computations that include cross-iteration computations. The apparatus includes a functional unit having at least one input and an output, each input being operable to receive an input data value and an associated input data validity tag indicative of the validity of the input data value and the output being operable to provide an output data value and an associated output data validity tag indicative of the validity of the output data value. The first functional unit is operable in a first mode in which an output data value from the first functional unit is valid if all of the input data values are valid, and in a second mode in which the output data value from the first functional unit is valid if any of the input data values is valid.