摘要:
A Fully Interconnected System Architecture (FISA) for an improved data processing system. The data processing system topology has a processor chip and external components to the processor chip, such as memory and input/output (I/O) and other processor chips. The processor chip is interconnected to the external components via a point-to-point bus topology controlled by an intra-chip integrated, distributed switch (IDS) controller. The IDS controller provides the chip with the functionality to provide a single bus to each external component and provides an overall total bandwidth greater than traditional topologies while reducing latencies between the processor and the external components. The design of the processor chip with the intra-chip IDS controller provides a pseudo “distributed switch” which may separately access distributed external components, such as memory and I/Os, etc.
摘要:
A data processing system having a modified processor chip and external components to the processor chip. The processor chip is interconnected to the external components via point-to-point bus connections controlled by an integrated distributed switch (IDS) controller. The IDS controller is placed, during chip design, in the upper layer metals of the processor chip. When the data processing system is a multi-chip multiprocessor data processing system, the IDS controller operates to provide a pseudo switching effect whereby the processor is directly connected to each external component. The IDS controller permits the processor to have greater communication bandwidth and reduced latencies with the external components. It also allows for a connection to distributed external components such as memory and I/O, etc. with overall reduced system components.
摘要:
A method of providing programmable associativity in a cache used by a processor of a computer system is disclosed. A congruence class of a memory block is defined using a first mapping function, providing a first associativity level of the cache. Program instructions in the processor select a second associativity level of a known appropriate level, and implement the second associativity level in the cache using a second mapping function. Application software may provide the program instructions, wherein the application software has procedures that may result in cache "strides" at particular associativity levels, and the known appropriate level is chosen to lessen memory latencies due to strides. Alternatively, the program instructions may be part of an operating system which monitors memory address requests, determines how efficient a procedure will operate at different associativity levels, and selects a most efficient level for the known appropriate level. The program instructions may select the associativity level by setting a value in a bit facility corresponding to the desired mapping function.
摘要:
A method of providing programmable associativity in a cache used by a processor of a computer system is disclosed. A congruence class of a memory block is defined using a first mapping function, providing a first associativity level of the cache. A logic unit connected to the cache monitors cache misses as the cache uses the first associativity level, and selects other associativity levels based on the cache misses, using other mapping functions. The logic unit has incorporated therein means for selecting the other associativity levels based on a rate of the cache misses in a particular congruence class. The congruence class may be defined by associating the memory block with a particular set of cache blocks in the cache, based on a first portion of an address of the memory block, and the other mapping functions may be implemented by dividing the particular set into subsets and selecting a subset for the memory block based on a second portion of the address.
摘要:
A method of improving operation of a cache used by a processor of a computer system by introducing a level of randomness into a replacement algorithm used by the cache in order to lessen "strides" within the cache is disclosed. Different levels of randomness may be introduced into the replacement algorithm at different times to optimize the cache for different procedures running on the processor. The level of randomness can be selectively introduced by using a basic replacement algorithm to select a subset of a congruence class, and one or more random bits are then used to select a specific cache block within the subset for eviction. The basic replacement algorithm can be a least recently used algorithm. There may be three levels of randomness for a 4-way set associative cache, and there may be four levels of randomness for an 8-way set associative cache.
摘要:
A signal is transmitted from a sending chip to a first receiving chip in a communications ring via a first i/o set of the sending chip. A signal from the sending chip to a second receiving chip in the communications ring is transmitted via a second i/o set of the sending chip. The first i/o set corresponds to a first direction for the sending chip transmitting around the ring, and the second i/o set corresponds to a second direction for the sending chip transmitting around the ring. The transmitting via the first i/o set is for a circumstance where a number of chips interposed in the ring between the sending and receiving chips in the first direction is not greater than the number of chips interposed in the second direction. The transmitting via the second i/o set is for a circumstance where the number is greater. For a chip interposed between the sending and receiving chips, the transmitting includes traversing from one of the first and second i/o sets of the at least one interposed chip, to the other one of the first and second i/o sets of the at least one interposed chip. The signal traversing the at least one interposed chip is regenerated by the interposed chip.
摘要:
A method of operating a multi-level memory hierarchy of a computer system and an apparatus embodying the method, wherein multiple levels of storage subsystems are used to improve the performance of the computer system, each next higher level generally having a faster access time, but a smaller amount of storage. Values within a level are indexed by a directory that provides an indexing of information relating the values in that level to the next lower level. In a preferred embodiment of the invention, the directories for the various levels of storage are contained within the next higher level, providing a faster access to the directory information. Cache memories used as the highest levels of storage, and one or more sets are allocated out of that cache memory for containing a directory of the next lower level of storage. An address comparator which is used to compare entries in a directory to address values is directly coupled to the set or sets used for the directory, reducing the time needed to compare addresses in determining whether an address is present in the cache.
摘要:
A data processing system with configurable processor chip buses. The processor chip is designed with a bus allocation unit and has a plurality of extended buses of which a number are configurable buses (i.e. may be dynamically allocated to any one of several external components, particularly memory and other SMPs). A priority determination of bandwidth requirements of the external components is made during system processing. Then the configurable buses are dynamically allocated to the external components based on their bandwidth requirement and/or the configuration which provides the best overall system efficiency.
摘要:
A data processing system with configurable processor chip buses. The processor chip is designed with a plurality of extended buses of which a number are configurable buses (i.e., capable of being allocated to one of several external components, particularly memory and other SMPs). The processor chip allows for the static allocation of these configurable buses to these external components, based primarily on vendor system design preferences.
摘要:
A cache memory logically associates a cache line with at least two cache sectors of a cache array wherein different sectors have different output latencies and, for a load hit, selectively enables the cache sectors based on their latency to output the cache line over successive clock cycles. Larger wires having a higher transmission speed are preferably used to output the cache line corresponding to the requested memory block. In the illustrative embodiment the cache is arranged with rows and columns of the cache sectors, and a given cache line is spread across sectors in different columns, with at least one portion of the given cache line being located in a first column having a first latency, and another portion of the given cache line being located in a second column having a second latency greater than the first latency. One set of wires oriented along a horizontal direction may be used to output the cache line, while another set of wires oriented along a vertical direction may be used for maintenance of the cache sectors. A given cache line is further preferably spread across sectors in different rows or cache ways. For example, a cache line can be 128 bytes and spread across four sectors in four different columns, each sector containing 32 bytes of the cache line, and the cache line is output over four successive clock cycles with one sector being transmitted during each of the four cycles.