摘要:
A cache memory logically partitions a cache array into at least two slices each having a plurality of cache lines, with a given cache line spread across two or more cache ways of contiguous bytes and a given cache way shared between the two cache slices, and if one a cache way is defective that is part of a first cache line in the first cache slice and part of a second cache line in the second cache slice, it is disabled while continuing to use at least one other cache way which is also part of the first cache line and part of the second cache line. In the illustrative embodiment the cache array is set associative and at least two different cache ways for a given cache line contain different congruence classes for that cache line. The defective cache way can be disabled by preventing an eviction mechanism from allocating any congruence class in the defective way. For example, half of the cache line can be disabled (i.e., half of the congruence classes). The cache array may be arranged with rows and columns of cache sectors (rows corresponding to the cache ways) wherein a given cache line is further spread across sectors in different rows and columns, with at least one portion of the given cache line being located in a first column having a first latency and another portion of the given cache line being located in a second column having a second latency greater than the first latency. The cache array can also output different sectors of the given cache line in successive clock cycles based on the latency of a given sector.
摘要:
A cache memory logically associates a cache line with at least two cache sectors of a cache array wherein different sectors have different output latencies and, for a load hit, selectively enables the cache sectors based on their latency to output the cache line over successive clock cycles. Larger wires having a higher transmission speed are preferably used to output the cache line corresponding to the requested memory block. In the illustrative embodiment the cache is arranged with rows and columns of the cache sectors, and a given cache line is spread across sectors in different columns, with at least one portion of the given cache line being located in a first column having a first latency, and another portion of the given cache line being located in a second column having a second latency greater than the first latency. One set of wires oriented along a horizontal direction may be used to output the cache line, while another set of wires oriented along a vertical direction may be used for maintenance of the cache sectors. A given cache line is further preferably spread across sectors in different rows or cache ways. For example, a cache line can be 128 bytes and spread across four sectors in four different columns, each sector containing 32 bytes of the cache line, and the cache line is output over four successive clock cycles with one sector being transmitted during each of the four cycles.
摘要:
A cache memory logically associates a cache line with at least two cache sectors of a cache array wherein different sectors have different output latencies and, for a load hit, selectively enables the cache sectors based on their latency to output the cache line over successive clock cycles. Larger wires having a higher transmission speed are preferably used to output the cache line corresponding to the requested memory block. In the illustrative embodiment the cache is arranged with rows and columns of the cache sectors, and a given cache line is spread across sectors in different columns, with at least one portion of the given cache line being located in a first column having a first latency, and another portion of the given cache line being located in a second column having a second latency greater than the first latency. One set of wires oriented along a horizontal direction may be used to output the cache line, while another set of wires oriented along a vertical direction may be used for maintenance of the cache sectors. A given cache line is further preferably spread across sectors in different rows or cache ways. For example, a cache line can be 128 bytes and spread across four sectors in four different columns, each sector containing 32 bytes of the cache line, and the cache line is output over four successive clock cycles with one sector being transmitted during each of the four cycles.
摘要:
In a cache coherent data processing system including at least first and second coherency domains, a memory block is stored in a system memory in association with a domain indicator indicating whether or not the memory block is cached, if at all, only within the first coherency domain. A master in the first coherency domain determines whether or not a scope of broadcast transmission of an operation should extend beyond the first coherency domain by reference to the domain indicator stored in the cache and then performs a broadcast of the operation within the cache coherent data processing system in accordance with the determination.
摘要:
A data processing system includes at least a first processing node having an input/output (I/O) controller and a second processing including a memory controller for a memory. The memory controller receives, in order, pipelined first and second DMA write operations from the I/O controller, where the first and second DMA write operations target first and second addresses, respectively. In response to the second DMA write operation, the memory controller establishes a state of a domain indicator associated with the second address to indicate an operation scope including the first processing node. In response to the memory controller receiving a data access request specifying the second address and having a scope excluding the first processing node, the memory controller forces the data access request to be reissued with a scope including the first processing node based upon the state of the domain indicator associated with the second address.
摘要:
In response to execution of program code, a control register within scrubbing logic in a local coherency domain is initialized with at least a target address of a target memory block. In response to the initialization, the scrubbing logic issues to at least one cache hierarchy in a remote coherency domain a domain indication scrubbing request targeting a target memory block that may be cached by the at least one cache hierarchy. In response to receipt of a coherency response indicating that the target memory block is not cached in the remote coherency domain, a domain indication in the local coherency domain is updated to indicate that the target memory block is cached, if at all, only within the local coherency domain.
摘要:
A cache coherent data processing system includes a memory and at least first and second coherency domains that each include a respective one of first and second cache memories. A master in the first coherency domain selects a scope of an initial broadcast of an operation targeting a request address allocated to the memory from among a first scope including only the first coherency domain and a second scope including both the first and second coherency domains. The master selects the scope based, at least in part, upon whether the memory belongs to the first coherency domain and performs an initial broadcast of the operation within the cache coherent data processing system utilizing the selected scope.
摘要:
A data processing system includes a plurality of processing units each having a respective point-to-point communication link with each of multiple others of the plurality of processing units but fewer than all of the plurality of processing units. Each of the plurality of processing units includes interconnect logic, coupled to each point-to-point communication link of that processing unit, that broadcasts operations received from one of the multiple others of the plurality of processing units to one or more of the plurality of processing units.
摘要:
Scrubbing logic in a local coherency domain issues to at least one cache hierarchy in a remote coherency domain a domain reset request that forces invalidation of any cached copy of a target memory block then held in said remote coherency domain. A coherency response to said domain reset request is received. In response to said coherency response indicating that said target memory block is not cached in said remote coherency domain, a domain indication of said local coherency domain is updated to indicate that said target memory block is cached, if at all, only within said local coherency domain.
摘要:
In a cache coherent data processing system including at least first and second coherency domains, a memory block is stored in a system memory in association with a domain indicator indicating whether or not the memory block is cached, if at all, only within the first coherency domain. A master in the first coherency domain determines whether or not a scope of broadcast transmission of an operation should extend beyond the first coherency domain by reference to the domain indicator stored in the cache and then performs a broadcast of the operation within the cache coherent data processing system in accordance with the determination.