摘要:
A method of providing programmable congruence classes in a cache used by a processor of a computer system is disclosed. Program instructions are loaded in the processor for modifying original addresses of memory blocks in a memory device to produce encoded addresses. A plurality of cache congruence classes is then defined using a mapping function which operates on the encoded addresses, such that the program instructions may be used to arbitrarily assign a given one of the original addresses to a particular one of the cache congruence classes. The program instructions can modify the original addresses by setting a plurality of programmable fields. Application software may provide the program instructions, wherein congruence classes are programmed based on a particular procedure of the application software which is running on the processor, that might otherwise run with excessive "striding" of the cache. Alternatively, operating-system software may monitor allocation of memory blocks in the cache and provides the program instructions to modify the original addresses based on the allocation of the memory blocks, to lessen striding.
摘要:
A method of providing instructions and data values to a processing unit in a multi-processor computer system, by expanding the prior-art MESI cache-coherency protocol to include an additional cache-entry state corresponding to a most recently accessed state. Each cache of the processing units has at least one cache line with a block for storing the instruction or data value, and an indication is provided that a cache line having a block which contains the instruction or data value is in a "recently read" state. Each cache entry has three bits to indicate the current state of the cache entry (one of five possible states). A processing unit which desires to access a shared instruction or data value detects transmission of the indication from the cache having the most recently accessed copy, and the instruction or data value is sourced from this cache. Upon sourcing the instruction or data value, the cache that originally contained the most recently accessed copy thereof changes its indication to indicate that its copy is now shared, and the processing unit which accessed the instruction or data value is thereafter indicated as having the cache containing the copy thereof that was most recently accessed. This protocol allows instructions and data values which are shared among several caches to be sourced directly (intervened) by the cache having the most recently accessed copy, without retrieval from system memory (RAM), significantly improving the processing speed of the computer system.
摘要:
A method of providing programmable congruence classes in a cache used by a processor of a computer system is disclosed. A logic unit is connected to the cache for modifying original addresses of memory blocks in a memory device to produce encoded addresses. A plurality of cache congruence classes are then defined using a mapping function which operates on the encoded addresses, such that the logic unit may be used to arbitrarily assign a given one of the original addresses to a particular one of the cache congruence classes. The logic unit can modify the original addresses by setting a plurality of programmable fields. The logic unit also can collect information on cache misses, and modify the original addresses in response to the cache miss information. In this manner, a procedure running on the processor and allocating memory blocks to the cache such that the original addresses, if applied to the mapping function, would result in striding of the cache, runs more efficiently by using the encoded addresses to result in less striding of the cache.
摘要:
A data processing system and method of communicating data in a data processing system are described. The data processing system includes a communication network to which a plurality of devices are coupled. At least one device among the plurality of devices coupled to the communication network includes mastering circuitry and snooping circuitry. According to the method, a first timing signal having a first frequency and a second timing signal having a second frequency different from the first frequency are generated. Communication transactions on the communication network are initiated utilizing the mastering circuitry, which operates in response to the first timing signal, and are monitored utilizing the snooping circuitry, which operates in response to the second timing signal.
摘要:
A method and system for front-end gathering of store instructions within a processor is disclosed. In accordance with the method and system of the present invention, a store queue within a data-processing system includes a front-end queue and a back-end queue. Multiple entries are provided in the back-end queue, and each entry includes an address field, a byte-count field, and a data field. A determination is first made as to whether or not a data field of a first entry of the front-end queue is filled completely. In response to a determination that the data field of the first entry of the front-end queue is not filled completely, another determination is made as to whether or not an address for a store instruction in a subsequent second entry is equal to an address for the store instruction in the first entry plus a byte count in the first entry. In response to a determination that the address for the store instruction in the subsequent second entry is equal to the address for the store instruction in the first entry plus the byte count in the first entry, the store instruction in the subsequent second entry is collapsed into the store instruction in the first entry.
摘要:
A method and system for controlling access to a shared resource in a data processing system are described. According to the method, a number of requests for access to the resource are generated by a number of requesters that share the resource. Each of the requesters is associated with a priority weight that indicates a probability that the associated requester will be assigned a highest current priority. Each requester is then assigned a current priority that is determined substantially randomly with respect to previous priorities of the requesters. In response to the current priorities of the requesters, a request for access to the resource is granted. In one embodiment, a requester corresponding to a granted request is signaled that its request has been granted, and a requester corresponding to a rejected request is signaled that its request was not granted.
摘要:
A method and system for allocating data among cache memories within a symmetric multiprocessor data-processing system are disclosed. The symmetric multiprocessor data-processing system includes a system memory and multiple processing units, wherein each of the processing units has a cache memory. The system memory is divided into a number of segments, wherein the number of segments is equal to the total number of cache memories. Each of these segments is represented by one of the cache memories such that a cache memory is responsible to cache data from its associated segment within the system memory.
摘要:
A data processing system includes a processor, a system memory, one or more input/output channel controllers (IOCC), and a system bus connecting the processor, the memory and the IOCCs together for communicating instructions, address and data between the various elements of a system. The IOCC includes a paged cache storage having a number of lines wherein each line of the page may be, for example, 32 bytes. Each page in the cache also has several attribute bits for that page including the so called WIM and attribute bits. The W bit is for controlling write through operations; the I bit controls cache inhibit; and the M bit controls memory coherency. Since the IOCC is unaware of these page table attribute bits for the cache lines being DMAed to system memory, IOCC must maintain memory consistency and cache coherency without sacrificing performance. For DMA write data to system memory, new cache attributes called global, cachable and demand based write through are created. Individual writes within a cache line are gathered by the IOCC and only written to system memory when the I/O bus master accesses a different cache line or relinquishes the I/O bus.
摘要:
According to a first aspect of the present invention, a data processing system is provided that includes a communication network to which multiple devices are coupled. A first of the multiple devices includes a number of requestors (or queues), which are each permanently assigned a respective one of a number of unique tags. In response to a communication request by a requestor within the first device, a tag assigned to the requestor is transmitted on the communication network in conjunction with the requested communication transaction. According to a second aspect of the present invention, a data processing system includes a cache having a cache directory. A status indication indicative of the status of at least one of a plurality of data entries in the cache is stored in the cache directory. In response to receipt of a cache operation request, a determination is made whether to update the status indication. In response to the determination that the status indication is to be updated, the status indication is copied into a shadow register and updated. The status indication is then written back into the cache directory at a later time. The shadow register thus serves as a virtual cache controller queue that dynamically mimics a cache directory entry without functional latency.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the data storage and the execution resources, that supplies instructions within the data storage to the execution resources. The execution resources include a plurality of load-store units that each process only instructions that access data having associated addresses within a respective one of a plurality of subsets of an address space. The load-store units can have diverse hardware such that a maximum number of instructions that can be concurrently executed is different for different load-store units or such that some of the load-store units are restricted to executing certain classes of instructions.