Abstract:
An apparatus and method for non-blocking execution of a static scheduled processor, the apparatus including a processor to process at least one operation using transferred input data, and an input buffer used to transfer the input data to the processor, and store a result of processing the at least one operation, wherein the processor may include at least one functional unit (FU) to execute the at least one operation, and the at least one FU may process the transferred input data using at least one of a regular latency operation and an irregular latency operation.
Abstract:
A method and apparatus for efficient and consistent validation/conflict detection in a Software Transactional Memory (STM) system is herein described. A version check barrier is inserted after a load to compare versions of loaded values before and after the load. In addition, a global timestamp (GTS) is utilized to track a latest committed transaction. Each transaction is associated with a local timestamp (LTS) initialized to the GTS value at the start of a transaction. As a transaction commits it updates the GTS to a new value and sets versions of modified locations to the new value. Pending transactions compare versions determined in read barriers to their LTS. If the version is greater than their LTS indicating another transaction has committed after the pending transaction started and initialized the LTS, then the pending transaction validates its read set to maintain efficient and consistent transactional execution.
Abstract:
A system and method for efficient reliable execution on a simultaneous multithreading machine. A processor is placed in a reliable execution mode (REM) to detect possible errors during execution of a software application. Only two threads may be configured to operate in this mode. Floating-point store and integer-transfer unary instructions may be converted to new instructions. Each new instruction has two source operands, each corresponding to a different thread is specified by a same logical register number as a single source operand of the original unary instruction. All other instructions are replicated, wherein the original instruction and its twin are assigned to different threads. Simultaneous multi-threaded (SMT) floating-point logic may only be able to provide lockstep execution when it communicates using the new instruction with instantiated integer independent clusters. The new instruction cannot begin until both source operands are ready, which are subsequently compared to determine any mismatches or errors.
Abstract:
A processor core (102) is provided that is a programmable digital signal processor (DSP) with variable instruction length, offering both high code density and easy programming. Architecture and instruction set are optimized for low power consumption and high efficiency execution of DSP algorithms, such as for wireless telephones, as well as pure control tasks. A cache (814) located within a megacell on a single integrated circuit (800) is provided to reduce instruction access time. The cache is for instructions only so that cache coherency measures due to writing data are not needed. Cache coherence circuitry (816) is included within the megacell and monitors selected signals to maintain coherence within the cache during emulation and debugging operations.
Abstract:
The present invention relates to a data processor which comprises a first pipeline for decoding and executing data instructions, a second pipeline for decoding and executing address instructions a third pipeline for executing loop instructions, a unit for issuing multiple instructions to said pipelines, a first set of registers being coupled with said first pipeline, and a second set of registers being coupled with said second pipeline and said third pipeline, wherein first second and third pipelines process data in parallel.
Abstract:
A processor core (102) is provided that is a programmable digital signal processor (DSP) with variable instruction length, offering both high code density and easy programming. Architecture and instruction set are optimized for low power consumption and high efficiency execution of DSP algorithms, such as for wireless telephones, as well as pure control tasks. A cache (814) located within a megacell on a single integrated circuit (800) is provided to reduce instruction access time. The cache is for instructions only so that cache coherency measures due to writing data are not needed. Cache coherence circuitry (816) is included within the megacell and monitors selected signals to maintain coherence within the cache during emulation and debugging operations.
Abstract:
A microcomputer MCU adopting the general purpose register method is enabled to have a small program capacity or a high program memory using efficiency and low system cost, while enjoying the advantage of simplification of the instruction de-coding as in the RISC machine having a fixed length instruction format of the prior art, by adopting an instruction format of a fixed length of 2 n bits which is smaller than the length of the maximum data word fed to instruction execution means. The control of the coded division is executed by noting the code bits.
Abstract:
A microcomputer MCU adopting the general purpose register method is enabled to have a small program capacity or a high program memory using efficiency and low system cost, while enjoying the advantage of simplification of the instruction de-coding as in the RISC machine having a fixed length instruction format of the prior art, by adopting an instruction format of a fixed length of 2 n bits which is smaller than the length of the maximum data word fed to instruction execution means. The control of the coded division is executed by noting the code bits.
Abstract:
A processor core (102) is provided that is a programmable digital signal processor (DSP) with variable instruction length, offering both high code density and easy programming. Architecture and instruction set are optimized for low power consumption and high efficiency execution of DSP algorithms, such as for wireless telephones, as well as pure control tasks. A cache (814) located within a megacell on a single integrated circuit (800) is provided to reduce instruction access time. The cache is for instructions only so that cache coherency measures due to writing data are not needed. Cache coherence circuitry (816) is included within the megacell and monitors selected signals to maintain coherence within the cache during emulation and debugging operations.