摘要:
The inventive cache uses a queuing structure which provides out-of-order cache memory access support for multiple accesses, as well as support for managing bank conflicts and address conflicts. The inventive cache can support four data accesses that are hits per clocks, support one access that misses the L1 cache every clock, and support one instruction access every clock. The responses are interspersed in the pipeline, so that conflicts in the queue are minimized. Non-conflicting accesses are not inhibited, however, conflicting accesses are held up until the conflict clears. The inventive cache provides out-of-order support after the retirement stage of a pipeline.
摘要:
The inventive cache manages address conflicts and maintains program order without using a store buffer. The cache utilizes an issue algorithm to insure that accesses issued in the same clock are actually issued in an order that is consistent with program order. This is enabled by performing address comparisons prior to insertion of the accesses into the queue. Additionally, when accesses are separated by one or more clocks, address comparisons are performed, and accesses that would get data from the cache memory array before a prior update has actually updated the cache memory in the array are canceled. This provides a guarantee that program order is maintained, as an access is not allowed to complete until it is assured that the most recent data will be received upon access of the array.
摘要:
The inventive control logic provides the selection signals for a bi-endian rotator MUX. The logic determines the starting point for the data transfer by determining which input register byte is going to Byte 0 of the output register. The control logic passes the starting point to single decoder. The decoded value is then sent to a plurality of MUXs, one for each of the output register bytes. Each of the MUXs is prewired to receive a portion of bits of the decoded value, and the portion is arranged in a particular order. The MUXs then send their respective outputs to the rotator MUX as selection control signals.
摘要:
In order to multiply operands of different binary lengths using a common combined array, for example to do both 8 bit by 8 bit and 16 bit by 16 bit multiplications, 2.sup.m-1 multiplications are performed, where m is equal to the number of different bit lengths it is desired to multiply. For example, where 8.times.8 bit and 16.times.16 bit multiplications are done, 2 different multiplications are done. Each multiplication is an n.times.n/2.sup.m-1 multiplication, e.g., a 16.times.8 bit multiplication. Sign correction is performed by adding a correction vector or by modifying one of the partial products. The results of the multiplications are added together to obtain a 2 n bit result. Groups of bits from said 2 n result are selected depending on the length of the operands being multiplied.