摘要:
A method and apparatus for adding and multiplying floating-point operands such that a fixed-size mantissa result is produced. In accordance with the present addition method, the mantissa of a first floating-point operand is shifted in accordance with relative operand exponent information. Next, the first operand mantissa is added to the second operand mantissa. The addition step includes replacing a least significant non-overlapped portion of the first operand mantissa with a randomly-generated carry-in bit. In accordance with the multiplication method, a partial product array is generated from a pair of floating-point operand mantissas. Next, prior to compressing the partial product array into a compressed mantissa result, a lower-order bit portion of the partial product array is replaced with a randomly generated carry-in value.
摘要:
A method and apparatus for adding and multiplying floating-point operands such that a fixed-size mantissa result is produced. In accordance with the present addition method, the mantissa of a first floating-point operand is shifted in accordance with relative operand exponent information. Next, the first operand mantissa is added to the second operand mantissa. The addition step includes replacing a least significant non-overlapped portion of the first operand mantissa with a randomly-generated carry-in bit. In accordance with the multiplication method, a partial product array is generated from a pair of floating-point operand mantissas. Next, prior to compressing the partial product array into a compressed mantissa result, a lower-order bit portion of the partial product array is replaced with a randomly generated carry-in value.
摘要:
A processor-cache operational scheme and topology within a multi-processor data processing system having a shared lower level cache (or memory) by which the number of coherency busses is reduced and more efficient snoop resolution and coherency operations with the processor caches are provided. A copy of the internal (L1) cache directory is provided within the lower level (L2) cache or memory. The snoop operations and coherency maintenance operations of the L1 directory are completed by comparing the snoop addresses with the address tags of the copy of the L1 directory in the L2 cache. Updates to the coherency states of the copy of the L1 directory are mirrored in the L1 directory and L1 cache. This eliminates the need for the individual coherency buses of each processor that is coupled to the L2 cache and speeds up coherency operations because the snoops do not have to be transmitted to the L1 caches.
摘要:
An apparatus for controlling rounding modes in a single instruction multiple data (SIMD) floating-point unit is disclosed. The SIMD floating-point unit includes a floating-point status-and-control register (FPSCR) having a first rounding mode bit field and a second rounding mode bit field. The SIMD floating-point unit also includes means for generating a first slice and a second slice. During a floating-point operation, the SIMD floating-point unit concurrently performs a first rounding operation on the first slice and a second rounding operation on the second slice according to a bit in the first rounding mode bit field and a bit in the second rounding mode bit field within the FPSCR, respectively.
摘要:
High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.
摘要:
A package for integrated circuit chips. The package contains a silicon substrate having a top surface and a bottom surface. The package also contains a first means for electrically connecting the integrated circuits to the substrate attached to the top surface of the substrate. A multilevel wiring is located at the top surface and is coupled to the first connecting means and serves as a communication link among a plurality of the first connecting means to enable multi-chip processing. A via containing means for coupling the multilevel wiring at the top surface to the bottom surface runs through the substrate from the bottom surface to the top surface. A second means is also present for connecting the coupling means at the bottom surface of the substrate with external components.
摘要:
A circuit for generating control signals used in a microprocessor has a storage array, such as a read-only memory (ROM) array, which contains a plurality of predefined logic patterns. An entry of the ROM array is selected, such as by the use of an address decoder, to choose a specific pattern, and the specific pattern is then modified based on a dynamic signal to generate an output control signal. The microprocessor may further predecode a base instruction using operation and operand source bits to yield a predecoded instruction having an address field whose value corresponds to the specific pattern. The dynamic signal can be based on whether an operand should be forwarded from a microprocessor component, and the specific pattern is then equivalent to a value for control signals required to execute an instruction when assuming that the operand should not be forwarded. Special control states can also be implemented, such as stall, halt, or scan data, through the use of particular code points in the ROM.
摘要:
Disclosed is an electronic chip containing a plurality of electronic circuit partitions, distributed over the area of the chip, each including a processor core and a clock phase domain different from cores in other partitions of the chip. A source of same frequency, but different phase clock signals representing different clock domains, provides different phase signals to adjacent partitions for the purpose of reducing instantaneous magnitude switching currents. Intra-chip communication circuitry distributes control and data signals between partitions.
摘要:
A communications bus (300) includes a number of alternate transmission paths (311, 312) between a given source node (301) and respective destination node (305) on a common substrate. The source node (301) receives a signal from a first circuit (309) serviced by the bus (300) while the respective destination node (305) transfers that signal to a second circuit (310) serviced by the bus. The communications bus (300) includes two switching arrangements for switching between the alternate transmission paths (311, 312). A source switching arrangement (318) is interposed between the source node (301) and the respective alternate transmission path (311, 312). This source switching arrangement (318) selectively connects the respective source node (301) to a selected one of the alternate transmission paths (311, 312) and disconnects the source node (301) from each other alternate transmission path. A destination switching arrangement (319) is interposed between the destination node (305) and respective alternate transmission paths (311, 312). The destination switching arrangement (319) selectively connects the respective destination node (305) to the selected alternate transmission path and disconnects the respective destination node from each other alternate transmission path.
摘要:
A method and structure for optimizing a solution for a complex problem typically solved by software tools includes selectively converting problem data into a format appropriate for one or more preselected vendor's set of solution tools and inputting the formatted design data into the one or more preselected vendor's set of solution tools. If more than one vendor has been preselected, resultant solution results are compared and the optimum solution is selected.