摘要:
A data processing system (20) is able to perform parameter-selectable prefetch instructions to prefetch data for a cache (38). When attempting to be backward compatible with previously written code, sometimes performing this instruction can result in attempting to prefetch redundant data by prefetching the same data twice. In order to prevent this, the parameters of the instruction are analyzed to determine if such redundant data will be prefetched. If so, then the parameters are altered to avoid prefetching redundant data. In some of the possibilities for the parameters of the instruction, the altering of the parameters requires significant circuitry so that an alternative approach is used. This alternative but slower approach, which can be used in the same system with the first approach, detects if the line of the cache that is currently being requested is the same as the previous request. If so, the current request is not executed.
摘要:
Data stream touch instructions are software-directed asynchronous prefetch instructions that can improve the performance of a system. Ideally, such instructions are used in perfect synchronization with the actual memory fetches that are trying to speed up. In practical situations, it is difficult to predict ahead of time all side effects of these instructions and memory access latency/throughput during execution of any large program. Incorrect usage of such instructions can cause degraded performance of the system. Thus, it is advantageous to measure the performance of such instructions.
摘要:
A cache memory system including a cache memory suitable for coupling to a load/store unit of a CPU, a buffer unit comprised of a plurality of entries each including a data buffer and a corresponding address tag. The system is configured to initiate a data fetch transaction in response to a first store operation that misses in both the cache memory and the buffer unit, to allocate a first entry in the buffer unit, and to write the first store operation's data in the first entry's data buffer. The system is adapted to write data from at least one subsequent store operation into the first entry's data buffer if the subsequent store operation misses in the cache but hits in the first entry of the buffer unit prior to completion of the data fetch transaction. In this manner, the first entry's data buffer includes a composite of the first and subsequent store operations' data. Preferably, the cache system is further configured to merge, upon completion of the data fetch, the fetched data with the store operation data in the first entry's data buffer and to reload the cache memory from the first entry's data buffer. In the preferred embodiment, each buffer unit entry further includes data valid bits that indicate the validity of corresponding portions of the entry's data buffer. In this embodiment, the buffer unit is preferably configured to reload the cache memory from the first buffer unit entry if all of the first entry's data valid bits are set prior to completion of the data fetch transaction thereby affecting a “silent” reload of the cache memory in which no data is ultimately required from memory.
摘要:
A computer and its corresponding cache system includes a cache memory, a buffer unit, and a bus transaction queue. The buffer unit includes a plurality of entries suitable for temporarily storing data, address, and attribute information of operations generated by the CPU. A first operation initiated by the load store unit buffers an operation in a first entry of the buffer unit, which initiates a first transaction to be queued in a first entry of the bus transaction queue where the first transaction in the bus transaction queue points to the first entry in the buffer unit. Preferably, the buffer unit is configured to modify the first transaction from a first transaction type to a second transaction type prior to execution in response to an event that alters the data requirements of the queued transaction. Additional utility is achieved by merging multiple store operation that miss to a common cache line into a single entry. Further benefits is achieved by allowing multiple load misses to the same cache line to be completed from a buffer that reduces cache pipeline stalls.
摘要:
A data processing system (10) provides a mechanism for choosing when the data stream touch (DST) controller (300) is allowed access to the data cache and MMU (50). The mechanism uses a count value to determine at what point in program execution the DST controller (300) is allowed to interrupt normal load and store accesses. This allows DST prefetches to be optimized for maximum performance of the data processing system (10).
摘要:
A cache memory system including a cache memory configured for coupling to a load/store unit of a CPU, a buffer unit coupled to said cache memory, and an operation queue comprising a plurality of entries, wherein each valid operation queue entry points to an entry in the buffer unit. The buffer unit includes a plurality of data buffers and each of the data buffers is associated with a corresponding address tag. The system is configured to initiate a data fetch transaction and allocate an entry in the buffer unit in response to a CPU load operation that misses in both the cache memory and the buffer unit. The cache system is further configured to allocate entries in the operation queue in response to subsequent CPU load operations that miss in the cache memory but hit in the buffer unit prior to completion of the data fetch. Preferably, the system is configured to store the fetched data in the buffer unit entry upon satisfaction of said data fetch and still further configured to satisfy pending load operations in the operation queue from the buffer unit entry. In the preferred embodiment, the system is configured to reload the. cache memory from the buffer unit entry upon satisfying all operation queue entries pointing to the buffer unit entry and, thereafter, to invalidate the buffer unit entry and the operation queue entries. The buffer unit entries preferably each include data valid bits indicative of which portions of data stored in a buffer unit entry are valid.
摘要:
A method of transferring data between a slave device (20) in communication with a processor interface bus (34) where the processor interface bus is in communication with a master device (12) including receiving an address from the processor interface bus (34) where the address was provided by the master device (block 302). A first signal is asserted (blocks 318 and 324) on the processor interface bus (34) to indicate that the slave device (20) is servicing a data transfer transaction. A second signal is asserted (block 320) on the processor interface bus (34) to indicate whether data to be transferred using the processor interface bus (34) is to be stored in main memory (36) by a main memory controller (32) in communication with the processor interface bus (34). The data is transferred (block 326) between the slave device (20) and the processor interface bus (34).
摘要:
A data processing system (10) includes a mechanism for preventing DST line fetches from occupying the last available entries in a cache miss queue (50) of the data cache and MMU (16). This is done by setting a threshold value of available cache miss queue (50) buffers over which a DST access is not allowed. This prevents the cache miss queue (50) from filling up and preventing normal load and store accesses from using the cache miss queue (50).
摘要:
A bus protocol for a split bus (50, 60) where each device (10, 20, 30) coupled to the bus has an age-based queue (12, 24, 34) of pending transactions. Queues are updated as transactions are executed. A central arbiter (40) has a copy of each device's queue (44). A priority transaction is determined from among all the queues in the arbiter. A data transaction index (DTI) is broadcast during the data tenure to all devices indicating the position in the queue of the next transaction. The index allows out-of-order data transfers without the provision of a static tag during the address tenure. Queues maintain a history of pending transactions. In one embodiment, each device receives a separate data bus grant (DBG), allowing a single provision of the index to both a source and a sink device.
摘要:
A data processing system includes a data processor (10) coupled to a memory system having a first memory, such as an L1 data cache (16), arranged with a second memory (such as an L2 cache) at a lower hierarchical level. The data processor (10) prefetches data elements of a vector into the first memory prior to processing such data elements. If a requested data element is not present in the first memory, a load request is issued to the second memory and to lower levels of the memory hierarchy until the requested data element is finally retrieved and stored in the first memory. The data processor (10) continues to prefetch subsequent data elements of the vector by considering the length of the data element and the stride of the vector. In one embodiment, the data processor (10) prefetches the vector into the first memory in response to a single data stream touch load (DST) instruction (100).