摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the data storage and the execution resources, that supplies instructions within the data storage to the execution resources. The execution resources include a plurality of load-store units that each process only instructions that access data having associated addresses within a respective one of a plurality of subsets of an address space. The load-store units can have diverse hardware such that a maximum number of instructions that can be concurrently executed is different for different load-store units or such that some of the load-store units are restricted to executing certain classes of instructions.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, have diverse cache hardware and each preferably store only data having associated addresses within a respective one of a plurality of subsets of an address space. The diverse cache hardware can include, for example, differing cache sizes, differing associativities, differing sectoring, and differing inclusivities.
摘要:
A processor having a hashed and partitioned storage subsystem includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a cache subsystem including a plurality of caches that store data utilized by the execution unit. Each cache among the plurality of caches stores only data having associated addresses within a respective one of a plurality of subsets of an address space. In one preferred embodiment, the execution units of the processor include a number of load-store units (LSUs) that each process only instructions that access data having associated addresses within a respective one of the plurality of address subsets. The processor may further be incorporated within a data processing system having a number of interconnects and a number of sets of system memory hardware that each have affinity to a respective one of the plurality of address subsets.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the execution resources and the data storage, that supplies instructions within the data storage to the execution resources. At least one of the execution resources, the data storage, and the instruction sequencing unit is implemented with a plurality of hardware partitions of like function for processing a respective one of a plurality of data streams. If an error is detected in a particular hardware partition, the data stream assigned to that hardware partition is reassigned to another of the plurality of hardware partitions, thus preventing an error in one of the hardware partitions from resulting in a catastrophic failure.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, each store only data having associated addresses within a respective one of a plurality of subsets of an address space and implement diverse caching behaviors. The diverse caching behaviors can include differing memory update policies, differing coherence protocols, differing prefetch behaviors, and differing cache line replacement policies.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the execution resources and the data storage, that supplies instructions within the data storage to the execution resources. At least one of the execution resources, the data storage, and the instruction sequencing unit is implemented with a plurality of hardware partitions of like function for processing data. The data processed by each hardware partition is assigned according to a selectable hash of addresses associated as with the data. In a preferred embodiment, the selectable hash can be altered dynamically during the operation of the processor, for example, in response to detection of an error or a load imbalance between the hardware partitions.
摘要:
A processor having a hashed and partitioned register file includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of registers coupled to the execution unit. The plurality of registers are partitioned into a plurality of groups, such that registers within each group can store only data having associated addresses within a respective one of a plurality of subsets of an address space.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. Further branch predictions or stream associations that were made based on an earlier speculative choice are linked by using a tag pool which assigns a bit fields in the tag pool entries to the level of speculation depth. Each entry shares in common the bit field values associated with earlier branches or stream associations. When a branch or stream predicted entry is no longer needed, that entry can be cancelled and all entries that were to be loaded dependent on that entry can likewise be cancelled by walking through all entries sharing the bit fields corresponding to the speculation depth of the cancelled entry and tagging those entries as invalid.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and apparatus embodying the method, wherein instructions issue having an explicit prefetch request directly from an instruction sequence unit to a prefetch unit of the processing unit. The invention applies to values that are either operand data or instructions. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. These prefetch requests can be demand load requests, where the processing unit will need the operand data or instructions, or speculative load requests, where the processing unit may or may not need the operand data or instructions, but a branch prediction or stream association predicts that they might be needed. After a predetermined number of cycles has elapsed, the speculative load request is cancelled if the request has not already been completed.
摘要:
A method of maintaining cache coherency, by designating one cache that owns a line as a highest point of coherency (HPC) for a particular memory block, and sending a snoop response from the cache indicating that it is currently the HPC for the memory block and can service a request. The designation may be performed in response to a particular coherency state assigned to the cache line, or based on the setting of a coherency token bit for the cache line. The processing units may be grouped into clusters, while the memory is distributed using memory arrays associated with respective clusters. One memory array is designated as the lowest point of coherency (LPC) for the memory block (i.e., a fixed assignment) while the cache designated as the HPC is dynamic (i.e., changes as different caches gain ownership of the line). An acknowledgement snoop response is sent from the LPC memory array, and a combined response is returned to the requesting device which gives priority to the HPC snoop response over the LPC snoop response.