摘要:
A Fully Interconnected System Architecture (FISA) for an improved data processing system. The data processing system topology has a processor chip and external components to the processor chip, such as memory and input/output (I/O) and other processor chips. The processor chip is interconnected to the external components via a point-to-point bus topology controlled by an intra-chip integrated, distributed switch (IDS) controller. The IDS controller provides the chip with the functionality to provide a single bus to each external component and provides an overall total bandwidth greater than traditional topologies while reducing latencies between the processor and the external components. The design of the processor chip with the intra-chip IDS controller provides a pseudo “distributed switch” which may separately access distributed external components, such as memory and I/Os, etc.
摘要:
A data processing system having a modified processor chip and external components to the processor chip. The processor chip is interconnected to the external components via point-to-point bus connections controlled by an integrated distributed switch (IDS) controller. The IDS controller is placed, during chip design, in the upper layer metals of the processor chip. When the data processing system is a multi-chip multiprocessor data processing system, the IDS controller operates to provide a pseudo switching effect whereby the processor is directly connected to each external component. The IDS controller permits the processor to have greater communication bandwidth and reduced latencies with the external components. It also allows for a connection to distributed external components such as memory and I/O, etc. with overall reduced system components.
摘要:
A processor having a hashed and partitioned storage subsystem includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a cache subsystem including a plurality of caches that store data utilized by the execution unit. Each cache among the plurality of caches stores only data having associated addresses within a respective one of a plurality of subsets of an address space. In one preferred embodiment, the execution units of the processor include a number of load-store units (LSUs) that each process only instructions that access data having associated addresses within a respective one of the plurality of address subsets. The processor may further be incorporated within a data processing system having a number of interconnects and a number of sets of system memory hardware that each have affinity to a respective one of the plurality of address subsets.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the execution resources and the data storage, that supplies instructions within the data storage to the execution resources. At least one of the execution resources, the data storage, and the instruction sequencing unit is implemented with a plurality of hardware partitions of like function for processing a respective one of a plurality of data streams. If an error is detected in a particular hardware partition, the data stream assigned to that hardware partition is reassigned to another of the plurality of hardware partitions, thus preventing an error in one of the hardware partitions from resulting in a catastrophic failure.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, each store only data having associated addresses within a respective one of a plurality of subsets of an address space and implement diverse caching behaviors. The diverse caching behaviors can include differing memory update policies, differing coherence protocols, differing prefetch behaviors, and differing cache line replacement policies.
摘要:
A method of providing programmable congruence classes in a cache used by a processor of a computer system is disclosed. Program instructions are loaded in the processor for modifying original addresses of memory blocks in a memory device to produce encoded addresses. A plurality of cache congruence classes is then defined using a mapping function which operates on the encoded addresses, such that the program instructions may be used to arbitrarily assign a given one of the original addresses to a particular one of the cache congruence classes. The program instructions can modify the original addresses by setting a plurality of programmable fields. Application software may provide the program instructions, wherein congruence classes are programmed based on a particular procedure of the application software which is running on the processor, that might otherwise run with excessive "striding" of the cache. Alternatively, operating-system software may monitor allocation of memory blocks in the cache and provides the program instructions to modify the original addresses based on the allocation of the memory blocks, to lessen striding.
摘要:
A method of providing programmable congruence classes in a cache used by a processor of a computer system is disclosed. A logic unit is connected to the cache for modifying original addresses of memory blocks in a memory device to produce encoded addresses. A plurality of cache congruence classes are then defined using a mapping function which operates on the encoded addresses, such that the logic unit may be used to arbitrarily assign a given one of the original addresses to a particular one of the cache congruence classes. The logic unit can modify the original addresses by setting a plurality of programmable fields. The logic unit also can collect information on cache misses, and modify the original addresses in response to the cache miss information. In this manner, a procedure running on the processor and allocating memory blocks to the cache such that the original addresses, if applied to the mapping function, would result in striding of the cache, runs more efficiently by using the encoded addresses to result in less striding of the cache.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the data storage and the execution resources, that supplies instructions within the data storage to the execution resources. The execution resources include a plurality of load-store units that each process only instructions that access data having associated addresses within a respective one of a plurality of subsets of an address space. The load-store units can have diverse hardware such that a maximum number of instructions that can be concurrently executed is different for different load-store units or such that some of the load-store units are restricted to executing certain classes of instructions.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, have diverse cache hardware and each preferably store only data having associated addresses within a respective one of a plurality of subsets of an address space. The diverse cache hardware can include, for example, differing cache sizes, differing associativities, differing sectoring, and differing inclusivities.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the execution resources and the data storage, that supplies instructions within the data storage to the execution resources. At least one of the execution resources, the data storage, and the instruction sequencing unit is implemented with a plurality of hardware partitions of like function for processing data. The data processed by each hardware partition is assigned according to a selectable hash of addresses associated as with the data. In a preferred embodiment, the selectable hash can be altered dynamically during the operation of the processor, for example, in response to detection of an error or a load imbalance between the hardware partitions.