Abstract:
A computer-implemented method includes fetching a fetch-packet containing a first hyper-block from a first address of a memory. The fetch-packet contains a bitwise distance from an entry point of the first hyper-block to a predicted exit point. The method further includes executing a first branch instruction of the first hyper-block. The first branch instruction corresponds to a first exit point. The first branch instruction includes an address corresponding to an entry point of a second hyper-block. The method also includes storing, responsive to executing the first branch instruction, a bitwise distance from the entry point of the first hyper-block to the first exit point. The method further includes moving a program counter from the first exit point of the first hyper-block to the entry point of the second hyper-block.
Abstract:
A nested loop controller includes a first register having a first value initialized to an initial first value, a second register having a second value initialized to an initial second value, and a third register configured as a predicate FIFO, initialized to have a third value. The second value is advanced in response to a tick instruction during execution of a loop. In response to the second value reaching a second threshold, the second register is reset to the initial second value. The nested loop controller further includes a comparator coupled to the second register and to the predicate FIFO and configured to provide an outer loop indicator value as input to the predicate FIFO when the second value is equal to the second threshold, and provide an inner loop indicator value as input to the predicate FIFO when the second value is not equal to the second threshold.
Abstract:
Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction configured to cause the processor to output a first data value to a first address in a first data cache, outputting, by the processor, the first data value to a second address in a second data cache, receiving a second instruction configured to cause a streaming engine associated with the processor to prefetch data from the first data cache, determining that the first data value has not been outputted from the second data cache to the first data cache, stalling execution of the second instruction, receiving an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache, and resuming execution of the second instruction based on the received indication.
Abstract:
A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.
Abstract:
A computer-implemented method includes fetching a fetch-packet containing a first hyper-block from a first address of a memory, the fetch-packet containing a bitwise distance from an entry point of the first hyper-block to a predicted exit point; executing a first branch instruction of the first hyper-block, wherein the first branch instruction corresponds to a first exit point, and wherein the first branch instruction includes an address corresponding to an entry point of a second hyper-block; storing, responsive to executing the first branch instruction, a bitwise distance from the entry point of the first hyper-block to the first exit point; and moving a program counter from the first exit point of the first hyper-block to the entry point of the second hyper-block.
Abstract:
This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error.
Abstract:
Disclosed embodiments relate to a security firewall having a security hierarchy including: secure master (SM); secure guest (SG); and non-secure (NS). There is one secure master and n secure guests. The firewall includes one secure region for secure master and one secure region for secure guests. The SM region only allows access from the secure master and the SG region allows accesses from any secure transaction. Finally, the non-secure region can be implemented two ways. In a first option, non-secure regions may be accessed only upon non-secure transactions. In a second option, non-secure regions may be accessed any processing core. In this second option, the access is downgraded to a non-secure access if the security identity is secure master or secure guest. If the two security levels are not needed the secure master can unlock the SM region to allow any secure guest access to the SM region.
Abstract:
This invention is a digital signal processor form plural sums of absolute values (SAD) in a single operation. An operational unit performing a sum of absolute value operation comprising two sets of a plurality of rows, each row producing a SAD output. Plural absolute value difference units receive corresponding packed candidate pixel data and packed reference pixel data. A row summer sums the output of the absolute value difference units in the row. The candidate pixels are offset relative to the reference pixels by one pixel for each succeeding row in a set of rows. The two sets of rows operate on opposite halves of the candidate pixels packed within an instruction specified operand. The SAD operations can be performed on differing data widths employing carry chain control in the absolute difference unit and the row summers.
Abstract:
An asynchronous dual domain bridge is implemented between the cache coherent master and the coherent system interconnect. The bridge has 2 halves, one in each clock/powerdown domain—master and interconnect. The asynchronous bridge is aware of the bus protocols used by each individual processor within the attached subsystem, and can perform the appropriate protocol conversion on each processor's transactions to adapt the transaction to/from the bus protocol used by the interconnect.
Abstract:
A statically scheduled processor compiler schedules a speculative load in the program before the data is needed. The compiler inserts a conditional instruction confirming or disaffirming the speculative load before the program behavior changes due to the speculative load. The condition is not based solely upon whether the speculative load address is correct but preferably includes dependence according to the original source code. The compiler may statically schedule two or more branches in parallel with orthogonal conditions.