Abstract:
A multithreaded processor device is disclosed and includes a first program thread and second program thread. The second program thread is execution linked to the first program thread in a lock step manner. As such, when the first program thread experiences a stall event, the second program thread is instructed to perform a no operation instruction in order to keep the second program thread execution linked to the first program thread. Also, the second program thread performs a no operation instruction during each clock cycle that the first program thread is stalled due to the stall event. When the first program thread performs a first successful operation after the stall event, the second program thread restarts normal execution.
Abstract:
Techniques for processing transmissions in a communications (e.g., CDMA) system. A multithreaded processor processes a plurality of threads operating via a plurality of processor pipelines associated with the multithreaded processor and predetermines a triggering event for the multithreaded processor to switch from a first thread to a second thread. The triggering event is variably and dynamically determined to optimize multithreaded processor performance. The triggering event may be a dynamically determined number of processor cycles, the number being determined to optimize the performance of the multithreaded processor, or a variably and dynamically determined event, such as a cache or instruction miss.
Abstract:
Certain aspects of the present disclosure provide techniques for generating execution schedules, comprising receiving a data flow graph for a process, where data flow graph comprises a plurality of nodes and a plurality of edge; generating a topological ordering for the data flow graph based at least in part on memory utilization of the process; generating a first modified topological ordering by inserting, into the topological ordering, one or more new nodes corresponding to memory access based on a predefined memory capacity; allocating units of memory in the memory based on the first modified topological ordering; and generating a second modified topological ordering by rearranging one or more nodes in the first modified topological ordering, where the second modified topological ordering enables increased parallel utilization of a plurality of hardware components.
Abstract:
A processor includes a vector register configured to load data responsive to a special purpose load instruction. The processor also includes circuitry configured to replicate a selected sub-vector value from the vector register.
Abstract:
A processor includes an integer multiplier configured to execute an integer multiply instruction to multiply significand bits of at least one floating point operand of a floating point multiply operation. The processor also includes a floating point multiplier configured to execute a special purpose floating point multiply accumulate instruction with respect to an intermediate result of the floating point multiply operation and the at least one floating point operand to generate a final floating point multiplication result.
Abstract:
Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.
Abstract:
A method and system to perform shifting and rounding operations within a microprocessor, such as, for example, a digital signal processor, during execution of a single instruction are described. An instruction to shift and round data within a source register unit of a register file structure is received within a processing unit. The instruction includes a shifting bit value indicating the bit amount for a right shift operation and is subsequently executed to shift data within the source register unit to the right by an encoded bit value, calculated by subtracting a single bit from the shifting bit value contained within the instruction. A predetermined bit extension is further inserted within the vacated bit positions adjacent to the shifted data. Subsequently, an addition operation is performed on the shifted data and a unitary integer value is added to the shifted data to obtain resulting data. Finally, the resulting data is further shifted to the right by a single bit value and a predetermined bit extension is inserted within the vacated bit position to obtain the final rounded data results to be stored within a destination register unit.
Abstract:
A method and system to indicate which page within a software-managed page table triggers an exception within a microprocessor, such as, for example, a digital signal processor, wherein a software-managed translation lookaside buffer (TLB) module receives a virtual address produced by an instruction within a Very Long Instruction Word (VLIW) packet, such as, for example, a fetch instruction, and further compares the virtual address to each stored TLB entry. If a match exists, then the TLB module outputs a corresponding mapped physical address for the instruction. Otherwise, if the VLIW packet spans two pages, where a first page is present as a TLB entry within the TLB module and the second page is missing from the stored TLB entries, an indication bit within a data field of a control register is set to identify the TLB miss exception to a software management unit. The software management unit retrieves the indication bit information from the register and further performs a page table look-up within the software-managed page table using the indication bit information in order to retrieve the missing page information. Subsequently, the missing page information is written into a new TLB entry within the TLB module for subsequent virtual address translation and execution of the packet of instructions.
Abstract:
A shared translation look-aside buffer method comprises saving data stored in a first selected set of registers to a predetermined section of a thread-specific area in memory upon encountering an exception/interrupt, re-enabling exceptions and optionally interrupts, addressing a cause of the exception/interrupt while safely permitting another exception, and restoring the saved data to the first selected set of registers.
Abstract:
A processor device is disclosed and includes a memory and a sequencer that is responsive to the memory. The sequencer can support very long instruction word (VLIW) instructions and superscalar instructions. The processor device further includes a first instruction execution unit responsive to the sequencer, a second instruction execution unit responsive to the sequencer, a third instruction execution unit responsive to the sequencer, and a fourth instruction execution unit responsive to the sequencer. Further, the processor device includes a plurality of register files and each of the plurality of register files includes a plurality of registers. The plurality of register files are coupled to the sequencer and coupled to the first instruction execution unit, the second instruction execution unit, the third instruction execution unit, and the fourth instruction execution unit.