摘要:
A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.
摘要:
A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.
摘要:
A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.
摘要:
Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories.Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.
摘要:
A computer system including an instruction cache (I-cache) having a plurality of banks for storing a subset of data from memory is shown to include a prediction mechanism for predicting which bank of the I-cache contains the required data. A prediction value, including a sequential prediction hint and a branch prediction hint, is associated with each instruction stored in the I-cache. The prediction value may either be stored with the I-cache data, or in a separate memory included before the I-cache. If the predicted value is incorrect, the predicted hint is `trained` to provide a higher degree of accuracy for repetitive instruction stream operation. Processor performance is additionally improved by providing a branch hint that allows for smoother transition between changing instruction streams.
摘要:
A multi-threaded processor provides for efficient flow-control from a pool of un-executed stores in an instruction queue to a store queue. The processor also includes similar capabilities with respect to load instructions. The processor includes logic organized into a plurality of thread processing units (“TPUs”) and allocation logic that monitors each TPUs demand for entries in the store queue. Demand is determined by subtracting an adjustable threshold value from the most recently assigned store identifier value. If the difference between the most recently assigned instruction identifier for a TPU and the TPU's threshold is non-zero, then it is determined that the TPU has demand for at least one entry in the store queue. The allocation logic includes arbitration logic that determines which one of a plurality of TPUs with store queue demand should be allocated a free entry in the store queue.
摘要:
Execution of a program's instructions in a simultaneous multithreaded processor is halted while the program is waiting for one or more events to occur by first arming an event monitor upon an arm instruction, that is, identifying to the event monitor one or more events to be monitored, such as a modification to a value or state of an identified memory location or group of locations, and setting a watch flag to indicate enable the event monitor. Upon execution of a quiesce request instruction, the program quiesces if the watch flag is set, and a timer is started. Upon observation by the event monitor of an identified event, or upon expiration of the timer, the watch flag is cleared and execution of the program resumes.
摘要:
A hierarchical cache memory includes a high-speed primary cache memory and a lower speed secondary cache memory of greater storage capacity than the primary cache memory. To manage a huge number of data lines interconnecting the primary and secondary cache memories, the hierarchical cache memory is integrated on a plurality of integrated circuits which include all of the interconnecting data lines. Each integrated circuit includes a primary memory and a secondary memory for storing and retrieving data transferred over a first data input line and a first data output line that link the primary memory to a central processing unit. At any given time, a multi-bit word is addressed in the secondary memory, and a corresponding multi-bit word is addressed in the primary memory. The primary and secondary memories are interconnected by a first multi-line bus for transferring a multi-bit word read from the secondary memory to the primary memory, and by a second multi-line bus for transferring a multi-bit word read from the primary memory to the secondary memory. The secondary memory is linked to a main memory by a second data output line and a second data input line for sequential transmission of bits to exchange multi-bit words during a writeback and refill operation. In a preferred embodiment, data inputs of the primary memory and the secondary memory are wired in parallel to a serial-parallel shift register that is used as a common write buffer.
摘要:
A multithreaded architecture is disclosed for managing external memory updates for fault detection in redundant multithreading systems using speculative memory support. In particular, a method provides input replication of load values on a SRT processor by using speculative memory support to isolate redundant threads form external updates. This method thus avoids the need for dedicated structures to provide input replication.
摘要:
A multithreaded architecture having one or more checker circuits that operate on store operations that send data outside of a sphere of replication. Fault detection mechanisms used to check outputs from the sphere of replication are reused for checkpointing at the conclusion of an execution epoch.