摘要:
A system for evaluating the performance of a computer system having a processor that passes through a plurality of processor states during operation and an associated system memory includes an operating unit for receiving a request to monitor specific process states from a user. Firmware causes the processor to enter the desired processor state requested by the user. The hardware identifies the occurrence of the desired processor state. Information relating to the occurrence of the desired process state is accumulated the memory. The accumulated information is read from memory and a report is provided to the user.
摘要:
A macropipelined microprocessor chip adheres to strict read and write ordering by sequentially buffering operands in queues during instruction decode, then removing the operands in order during instruction execution. Any instruction that requires additional access to memory inserts the requests into the queued sequence (in a specifier queue) such that read and write ordering is preserved. A specifier queue synchronization counter captures synchronization points to coordinate memory request operations among the autonomous instruction decode unit, instruction execution unit, and memory sub-system. The synchronization method does not restrict the benefit of overlapped execution in the pipelined. Another feature is treatment of a variable bit field operand type that does not restrict the location of operand data. Instruction execution flows in a pipelined processor having such an operand type are vastly different depending on whether operand data resides in registers or memory. Thus, an operand context queue (field queue) is used to simplify context-dependent execution flow and increase overlap. The field queue allows the instruction decode unit to issue instructions with variable bit field operands normally, sequentially identifying and fetching operands, and communicating the operand context that specifies register or memory residence across the pipeline boundaries to the autonomous execution unit. The mechanism creates opportunity for increasing the overlap of pipelined functions and greatly simplifies the splitting of execution flows.
摘要:
An operation of a processor is traced while fetching instructions from a memory to operate the processor. The tracing involves detecting an unpredictable fetching of instructions on the assumption that a predictable fetching can be reconstructed without any further input. The unpredictable fetching is identified as being due to either computable, conditional, or unanticipated events. Upon detecting the events, process control information, such as the next instruction to be fetched is recorded in a queue, and from the queue the information can be stored in a trace buffer. During reconstruction of the operation, the trace buffer, and the image including the instructions can be examined to analyze the real-time operation of the processor.
摘要:
A data flow of a processor is traced while accessing data stored in a memory and in a plurality of registers during operation of the processor. The tracing involves detecting an unpredictable accessing of data on the assumption that a predictable accessing can be reconstructed without any further input. The unpredictable accessing is identified by setting and clearing a trace bit associated with each of the registers according to identifying the accessing as direct memory-to-register, register-to-register, constant-to-register, and indirect memory. If a trace bit is set on a register storing data being used as a base address during the indirect memory acceding, data flow control information, such as the base address stored in the register being used during the indirect acceding is recorded in a queue, and from the queue the information can be stored in a trace buffer. During reconstruction of the operation, the trace buffer, and a copy of the data having an initial state can be examined to analyze the data flows during the real-time operation of the processor.
摘要:
A group of bit set and bit clear instructions are provided for a microprocessor to allow atomic modification of privileged architecture control registers. The bit set and bit clear instructions include an opcode that designates to the microprocessor that the instructions are to execute in privileged (kernel) state only, and that the instructions are to communicate with privileged control registers. Two operands are provided for the instructions, the first designating which of the privileged control registers is to be modified, the second designating a general purpose register that contains a bit mask. The bit set instructions set bits in the designated control register according to bits set in the bit mask. The bit clear instructions clear bits in the designated control register according to bits set in the bit mask. By atomically modifying privileged control registers, a requirement for strict nesting of interrupt routines is eliminated.
摘要:
A processor having an arithmetic extension of an instruction set architecture which incorporates a set of high performance floating point operations. The instruction set architecture incorporates a variety of data formats including single precision and double precision data formats, as well as the paired-single data format that allows two simultaneous operations on a pair of operands. The extension includes instructions directed to reduction add, reduction multiply, reciprocal, and reciprocal square root.
摘要:
An apparatus is presented for improving the efficiency of data transfers between devices interconnected over an on-chip system bus a multi-master computer system configuration. Bus efficiency is improved by providing an apparatus for controlling a read-modify-write transaction to an address in a bus slave device that does not suspend essential features of the system bus during the transaction, namely, pipelining and transaction splitting. The apparatus includes transaction control logic in a bus master device and transaction response logic in a bus slave device. The transaction control logic provides a write barrier command from the bus master device over the on-chip system bus to the bus slave device. The transaction response logic receives the write barrier command, and precludes execution of future transactions to the address within the bus slave device until completion of the read-modify-write transaction while allowing execution of transactions to other addresses within the bus slave device to complete.
摘要:
A macropipelined microprocessor chip adheres to strict read and write ordering by sequentially buffering operands in queues during instruction decode, then removing the operands in order during instruction execution. Any instruction that requires additional access to memory inserts the requests into the queued sequence (in a specifier queue) such that read and write ordering is preserved. A specifier queue synchronization counter captures synchronization points to coordinate memory request operations among the autonomous instruction decode unit, instruction execution unit, and memory sub-system. The synchronization method does not restrict the benefit of overlapped execution in the pipelined. Another feature is treatment of a variable bit field operand type that does not restrict the location of operand data. Instruction execution flows in a pipelined processor having such an operand type are vastly different depending on whether operand data resides in registers or memory. Thus, an operand context queue (field queue) is used to simplify context-dependent execution flow and increase overlap. The field queue allows the instruction decode unit to issue instructions with variable bit field operands normally, sequentially identifying and fetching operands, and communicating the operand context that specifies register or memory residence across the pipeline boundaries to the autonomous execution unit. The mechanism creates opportunity for increasing the overlap of pipelined functions and greatly simplifies the splitting of execution flows.
摘要:
A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queuing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. Internal processor registers are accessed with short (byte width) addresses instead of full physical addresses as used for memory and I/O references, but off-chip processor registers are memory-mapped and accessed by the same busses using the same controls as the memory and I/O.
摘要:
An apparatus which filters the number of invalidates to be propagated onto a private processor bus is provided. This is desirable so that the processor bus is not overloaded with invalidate requests. The present invention describes a method of filtering the number of invalidates to be propagated to each processor. A memory interface filters the invalidates by using a second private bus, the invalidate bus, which communicates with the cache controller. The cache controller can tell the memory interface whether data corresponding to the address on the invalidate bus is resident in the private cache memory of that processor. In this way, the memory interface only has to request the private processor bus when necessary, in order to perform the invalidate.