摘要:
High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.
摘要:
The present invention provides for calculating a shift amount as a function of a plurality of numbers. At least one decoder and the at least one adder are coupled in parallel. A shifter is configured to compute a value in a plurality of shift stages, and wherein a bit group of the shift amount is employable to affect at least one of the plurality of shift stages, thereby decreasing processing time.
摘要:
The present invention provides a data access ring. The data access ring has a plurality of attached processor units (APUs) and a local store associated with each APU. The data access ring has a data command ring, coupled to the plurality of APUs. The data command ring is employable to carry indicia of a selection of one of the plurality of APUs to the APUs. The data access ring also has a data address ring, coupled to the plurality of APUs. The data address ring is further employable to carry indicia of a memory location to the selected APU a predetermined number of clock cycles after the data command ring carries the indicia of the selection of one of the plurality of APUs. The data access ring also has a data transfer ring, coupled to the plurality of APUs. The data transfer ring is employable to transfer data to or from the memory location associated with the APU a predetermined number of clock cycles after the data address ring carries the indicia of the memory location to the selected APU.
摘要:
Disclosed is an apparatus for and a method of overcoming signal delay problems in a read-out path occurring in connection with pipelined memory circuits. A latch type sense amplifier (SA) is used to receive the memory cell logic levels during a pre-charge state in a cycle prior to read-out. Thus, the SA may quickly provide an output signal during a read latch clock cycle. The SA output is passed through a dynamically enabled logic circuit to a latch circuit for holding the receiving logic value for use in the next clock cycle.
摘要:
In one embodiment a multiprocessing apparatus includes a first processor and a second processor. Each of the processors have their own data and instruction caches to support independent operation. In a normal mode the processors independently execute separate instruction streams. Each of the processors has a respective signature generator. The system also includes a compare unit coupled to the signature generators. In a high reliability mode, both processors execute the same instruction stream. That is, each processor computes a version of a result for ones of the instructions in the stream. Responsive to the respective versions, the respective signature generators assert signatures to the compare unit, so that a faulting instruction may be detected. In another aspect, each processor has its own respective commit logic. The compare unit signals the commit logic in each respective processor that the possibility has been eliminated of a calculation interrupt arising for that instruction, once the compare unit receives signatures for corresponding versions of a result, but only if the signatures match. This permits the commit logic to commit the result. If the signatures do not match, the compare unit signals the commit logic that the corresponding instruction has faulted. The commit logic permits instructions prior to the faulting instruction in program order to continue execution, but initiates flushing of results that were produced by the faulting instruction and at least some instructions subsequent in program order to the faulting instruction.
摘要:
A latching dynamic logic structure is disclosed including a static logic interface, a dynamic logic gate, and a static latch. The static logic interface receives a data signal, a select signal, and a clock signal, and produces a first intermediate signal such that when the select signal is active, the first intermediate signal is dependent upon the data signal for a period of time following a clock signal transition. The dynamic logic gate discharges a dynamic node following the clock signal transition dependent upon the first intermediate signal. The static latch produces an output signal assuming one of two logic levels following the clock signal transition, and assuming the other logic level in the event the dynamic node is discharged. A scan-testing-enabled version of the latching dynamic logic structure is described, as is an integrated circuit including the latching dynamic logic structure.
摘要:
A method and apparatus for synthesizing logic circuits with synchronized outputs is disclosed. A logic designer selects a fixed number of levels in which to synthesize the circuit, each level implementing a plurality of different logic function all having the same propagation delay. Circuit outputs are synchronized by ensuring that each logic function is synthesized by connecting logic functions from level to level such that each signal path passes through each level once and only once.
摘要:
A dynamic incrementer, implemented in the Self Resetting Complementary Metal Oxide Semiconductor (SRCMOS) circuit family, which internally performs single rail calculations and which generates the dual rail result using a strobing technique. The carry-lookahead function is implemented with an OR tree using the complement input signals, resulting in a very fast and economical incrementer.
摘要:
A method of providing simultaneous, or overlapped, access to multiple cache levels to reduce the latency penalty for a higher level cache miss. A request for a value (data or instruction) is issued by the processor, and is forwarded to the lower level of the cache before determining whether a cache miss of the value has occurred at the higher level of the cache. In the embodiment wherein the lower level is an L2 cache, the L2 cache may supply the value directly to the processor. Address decoders are operated in parallel at the higher level of the cache to satisfy a plurality of simultaneous memory requests. One of the addresses (selected by priority logic based on hit/miss information from the higher level of the cache) is gated by a multiplexer to a plurality of memory array word line drivers of the lower level of the cache. Some bits in the address which do not require virtual-to-real translation can be immediately decoded.
摘要:
Fixed point instructions ADD, ROTATE, COMPARE-TO-ZERO, AND, OR and COUNT-LEADING-ZEROS are each performable in one circuit or macro. Such fixed point instructions may be implemented within an execution unit in a microprocessor, microcontroller, or digital signal processor.