摘要:
A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queuing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. Internal processor registers are accessed with short (byte width) addresses instead of full physical addresses as used for memory and I/O references, but off-chip processor registers are memory-mapped and accessed by the same busses using the same controls as the memory and I/O.
摘要:
An apparatus which filters the number of invalidates to be propagated onto a private processor bus is provided. This is desirable so that the processor bus is not overloaded with invalidate requests. The present invention describes a method of filtering the number of invalidates to be propagated to each processor. A memory interface filters the invalidates by using a second private bus, the invalidate bus, which communicates with the cache controller. The cache controller can tell the memory interface whether data corresponding to the address on the invalidate bus is resident in the private cache memory of that processor. In this way, the memory interface only has to request the private processor bus when necessary, in order to perform the invalidate.
摘要:
A processor and method for delaying the processing of cache coherency transactions during outstanding cache fills in a multi-processor system using a shared memory. A first processor fetches data having a specified address by addressing a cache memory, and when the specified address is not in the cache, saving the specified address in a fill address memory, and sending a fill request to the shared memory. Before return of fill data, the first processor receives a cache coherency request including the specified address from a second processor requesting invalidation of an addressed block of data. The first processor responds by checking whether the fill address memory includes the specified address, and upon finding the specified address in the fill address memory, delaying execution of the cache coherency request until the fill data is returned, and when the fill data is returned, using the fill data without retaining a validated block of the fill data in the cache. In a preferred embodiment, the fill memory is a content-addressable memory including a plurality of entries, and each entry has a fill address, an ownership fill bit (OREAD), an ownership-read invalidate pending bit (OIP), and a read invalidate pending bit (RIP). The OIP or RIP bit is set when execution of a cache coherency request is delayed, and these bits are read upon completion of a fill to execute the delayed request.
摘要:
Execution of a program's instructions in a simultaneous multithreaded processor is halted while the program is waiting for one or more events to occur by first arming an event monitor upon an arm instruction, that is, identifying to the event monitor one or more events to be monitored, such as a modification to a value or state of an identified memory location or group of locations, and setting a watch flag to indicate enable the event monitor. Upon execution of a quiesce request instruction, the program quiesces if the watch flag is set, and a timer is started. Upon observation by the event monitor of an identified event, or upon expiration of the timer, the watch flag is cleared and execution of the program resumes.
摘要:
A method and apparatus for controlling memory access operations of a pipelined processor using a "write queue" are described. The write queue temporarily stores addresses of writes not yet made in memory. Each write queue entry includes a write-read conflict bit. When an entry is first put into the write queue, the write-read conflict bit is cleared. When a subsequent memory read request occurs, the address of the read request is compared to the addresses stored in the write queue. If there is a match, the write-read conflict bit in the matching entry is set. If after this comparison no conflict bits are set, the read is allowed to proceed to memory before the queued writes. On the other hand, if any conflict bits are set, the read is prevented from proceeding. The conflict bits are cleared as the queued writes are performed in memory. Also, the write queue is able to accept additional entries while a read request is stalled. In a preferred arrangement, data-stream reads (D-reads) are given priority over instruction-stream reads (I-reads), and separate conflict bits are used to indicate D-read conflicts and I-read conflicts. In this fashion, the fetching of data and the fetching of instructions are stalled and resumed independently when conflicts arise.
摘要:
Writeback transactions from a processor and cache are fed to a main memory through a writeback queue, and non-writeback transactions from the processor and cache are fed to the main memory through a non-writeback queue. When a cache error is detected, an error transition mode (ETM) is entered that provides limited use of the data in the cache; a read or write request for data not owned in the cache is made to the main memory instead of the cache, even when the data is valid in the cache, although owned data is read from the cache. In ETM, when the processor makes a first write request to data not owned in the cache followed by a second write request to data owned in the cache, write data of the first write request is prevented from being received by the main memory after write data of the second request while permitting writeback of the data owned by the cache. Preferably this is done by sending the write requests from the processor through the non-writeback queue, and when a write request accesses data in a block of data owned by the cache, disowning the block of data in the cache and writing the disowned block of data back to the main memory.
摘要:
A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. A writeback cache is used (instead of writethrough) in a hierarchical cache arrangement, and writeback is allowed to proceed even though other accesses are suppressed due to queues being full. Separate queues are provided for the return data from memory and cache invalidates, yet the order or bus transactions is maintained by a pointer arrangement. The bus protocol used by the CPU to communicate with the system bus is of the pended type, with transactions on the bus identified by an ID field specifying the originator, and arbitration for bus grant goes one simultaneously with address/data transactions on the bus.
摘要:
Execution of a program's instructions in a simultaneous multithreaded processor is halted while the program is waiting for one or more events to occur by first arming an event monitor upon an arm instruction, that is, identifying to the event monitor one or more events to be monitored, such as a modification to a value or state of an identified memory location or group of locations, and setting a watch flag to indicate enable the event monitor. Upon execution of a quiesce request instruction, the program quiesces if the watch flag is set, and a timer is started. Upon observation by the event monitor of an identified event, or upon expiration of the timer, the watch flag is cleared and execution of the program resumes.
摘要:
A processor and method for preventing access to a locked memory block in a multiprocessor computer system. The processor has a cache memory and records a memory lock in a content-addressable memory separate from the cache memory. Preferably, outstanding cache fills are recorded in the same content addressable memory as memory locks, and a memory lock or an outstanding cache fill delays the execution of a cache coherency request upon the same memory block. When a cache coherency request is received from another processor, the address of the cache coherency request is compared to addresses stored in the content addressable memory, and when there is a match, a bit in the matching entry is set to indicate a delayed request that is executed after the lock is unlocked or the cache is refilled. In a specific embodiment, a memory lock or an outstanding cache fill also stalls a processor read or write to the same memory block.
摘要:
A load/store pipeline in a computer processor for loading data to registers and storing data from the registers has a cache memory within the pipeline for storing data. The pipeline includes buffers which support multiple outstanding read request misses. Data from out of the pipeline is obtained independently of the operation of the pipeline, this data corresponding to the request misses. The cache memory can then be filled with the requested for data. The provision of a cache memory within the pipeline, and the buffers for supporting the cache memory, speed up loading operations for the computer processor.