Abstract:
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to operational units for use as operands. A promotion unit optionally increases date element data size by an integral power of 2 either zero filing or sign filling the additional bits. A decimation unit optionally decimates data elements by an integral factor of 2. For ease of implementation the promotion factor must be greater than or equal to the decimation factor.
Abstract:
Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.
Abstract:
This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error.
Abstract:
This invention is data processing apparatus and method. Data is protecting from corruption using an error correction code by generating an error correction code corresponding to the data. In this invention the data and the corresponding error correction code are carried forward to another set of registers without regenerating the error correction code or using the error correction code for error detection or correction. Only later are error correction detection and correction actions taken. The differing data/error correction code registers may be in differing pipeline phases in the data processing apparatus. This invention forwards the error correction code with the data through the entire datapath that carries the data. This invention provides error protection to the whole datapath without requiring extensive hardware or additional time.
Abstract:
A statically scheduled processor compiler schedules a speculative load in the program before the data is needed. The compiler inserts a conditional instruction confirming or disaffirming the speculative load before the program behavior changes due to the speculative load. The condition is not based solely upon whether the speculative load address is correct but preferably includes dependence according to the original source code. The compiler may statically schedule two or more branches in parallel with orthogonal conditions.
Abstract:
A statically scheduled processor compiler schedules a speculative load in the program before the data is needed. The compiler inserts a conditional instruction confirming or disaffirming the speculative load before the program behavior changes due to the speculative load. The condition is not based solely upon whether the speculative load address is correct but preferably includes dependence according to the original source code. The compiler may statically schedule two or more branches in parallel with orthogonal conditions.
Abstract:
A predication method for vector processors that minimizes the use of embedded predicate fields in most instructions by using separate condition code extensions. Dedicated predicate registers provide fine grain predication of vector instructions where each bit of a predicate register controls 8 bit of the vector data.
Abstract:
A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.
Abstract:
A digital signal processor having at least one streaming address generator, each with dedicated hardware, for generating addresses for writing multi-dimensional streaming data that comprises a plurality of elements. Each at least one streaming address generator is configured to generate a plurality of offsets to address the streaming data, and each of the plurality of offsets corresponds to a respective one of the plurality of elements. The address of each of the plurality of elements is the respective one of the plurality of offsets combined with a base address.
Abstract:
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.