摘要:
A processor system that is switchable between a normal mode of operation without precise floating point exceptions and a debug mode of operation with precise floating point exceptions. The processor system includes a dispatch for dispatching integer and floating point instructions, an integer unit having a multi-stage integer pipeline for executing the integer instructions, and a floating point unit having a multi-stage floating point pipeline for executing the floating point instructions. The system begins operation in the normal mode, and upon receipt of an instruction to "switch to debug mode," the processor switches to the debug mode of operation with precise exceptions. In the debug mode, once a floating point instruction has been dispatched, all other instructions are prevented from being committed until the system determines whether the floating point instruction generates an exception. Thus, permitting the system to signal precise exceptions when not in the normal mode.
摘要:
A split level cache memory system for a data processor includes a single chip integer unit, an army processor such as a floating point unit, an external main memory and a split level cache. The split level cache includes an on-chip, fast local cache with low latency for use by the integer unit for loads and stores of integer and address data and an off-chip, pipelined global cache for storing arrays of data such as floating point data for use by the array processor and integer and address data for refilling the local cache. Coherence between the local cache and global cache is maintained by writing through to the global cache during integer stores. Local cache words are invalidated when data is written to the global cache during an army processor store.
摘要:
A method for preventing data loss and deadlock in a multi-processor computer system wherein at least one processor in the computer system includes a split-level cache. The split-level cache has a byte-writable first-level and a word-writable second level. The method monitors the second level cache to determine if a forced atomic (FA) instruction is in a second level cache pipeline. If an FA instruction is determined to be in the second level cache pipeline, then interventions to the second level cache are delayed until the FA instruction exits the second level cache pipeline. In this manner data written by operation of cache memory access instruction that cause the interventions is not destroyed by the execution of the FA instruction, thereby preventing data loss. The method also monitors the second level cache pipeline to determine if a possible miss (PM) instruction is in the second level cache pipeline. If a PM instruction is determined to be in the second level cache pipeline, the FA instructions are prevented from entering the second level cache pipeline such that execution of interventions to the second level cache is not prevented when an instruction in the second level cache may be detained to process an intervention in its behalf, thereby preventing deadlock between processing units of the computer system.
摘要:
A method for preventing deadlock due to the need for data exclusivity when performing forced atomic instructions in a multi-level cache in a multi-processor system. The system determines whether an aligned multi-byte word in which the data of a forced atomic instruction, such as an integer store operation, is exclusive in a first level cache. If so, the forced atomic instruction is allowed to enter a second level cache pipeline. If not, the forced atomic instruction is prevented from entering the second level cache pipeline and a cache miss and fill operation is initiated to cause the aligned word to be exclusive in the first level cache.
摘要:
A method and system for managing memory in a multiprocessor system includes defining the plurality of processor coherence domains within a system coherence domain of the multiprocessor system. The processor coherence domains each include a plurality of processors and a processor memory. Shared access to data in the processor memory of each processor coherence domain is provided only to elements of the multiprocessor system within the processor coherence domain. Non-shared access to data in the processor memory of each processor coherence domain is provided to elements of the multiprocessor system within and outside of the processor coherence domain.
摘要:
A memory access arbitration scheme is provided where transactions to a Shared memory are stored in an arbitration queue. Prior to arbitration, the transactions are compared against the contents of cache memory, to determine which transactions will hit in cache, which will miss and which will be victims. Also prior to arbitration, the entries in the arbitration queue are grouped according to a transaction parameter, such as DRAM bank, Write to Bank, Read to Bank, etc. Arbitration is the performed among those groups which are ready for service. From the group winning arbitration, the oldest transaction is selected for servicing. Preferably, a collapsible queuing structure and method is used, such that once a transaction is serviced, higher order entries ripple down in the queue to make room for new entries while maintaining an oldest to newest relationship among the queue entries.
摘要:
A multiprocessor system and method includes a processing sub-system including a plurality of processors in a processor memory system. A network is operable to couple the processing sub-system to an input/output (I/O) sub-system. The I/O sub-system includes a plurality of I/O interfaces each operable to couple a peripheral device to the multiprocessor system. The I/O interfaces each include a local memory operable to store exclusive read-only copies of data from the processor memory system for use by a corresponding peripheral device.
摘要翻译:多处理器系统和方法包括在处理器存储器系统中包括多个处理器的处理子系统。 网络可操作以将处理子系统耦合到输入/输出(I / O)子系统。 I / O子系统包括多个I / O接口,每个I / O接口可操作以将外围设备耦合到多处理器系统。 I / O接口各自包括本地存储器,其可操作以存储来自处理器存储器系统的数据的专用只读副本以供对应的外围设备使用。
摘要:
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
摘要:
The computer system having a first circuit board with a processor for processing information and a slot for receiving an IC card. The slot includes multiple pins for connection to the IC card. The IC card includes a second processor coupled to a second circuit board, where the processor is contained within outer framing structure. An interface coupled to the circuit board may be coupled to the multiple pins in the slot, such that the second processor in the integrated circuit card is able to control the computer system.
摘要:
A mechanism for dividing an integer dividend by an integer divisor to generate an integer quotient operates by aligning the divisor relative to the dividend such that a right-most bit of the divisor is aligned with a bit M of the dividend. The divisor is compared to an integer value whose right-most bits are equal to bits of the dividend which are aligned with bits of the divisor. As a result of this comparison, quotient bits which positionally correspond to the dividend bit M and to bits of the dividend which are located to the left of the dividend bit M are cleared to zero. Also as a result of the comparison, the dividend is divided by the divisor as aligned relative to the dividend to thereby generate values for any uncleared quotient bits.