摘要:
A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.
摘要:
A data processing system includes a plurality of local hubs each coupled to a remote hub by a respective one a plurality of point-to-point communication links. Each of the plurality of local hubs queues requests for access to memory blocks for transmission on a respective one of the point-to-point communication links to a shared resource in the remote hub. Each of the plurality of local hubs transmits requests to the remote hub utilizing only a fractional portion of a bandwidth of its respective point-to-point communication link. The fractional portion that is utilized is determined by an allocation policy based at least in part upon a number of the plurality of local hubs and a number of processing units represented by each of the plurality of local hubs. The allocation policy prevents overruns of the shared resource.
摘要:
A cache coherent data processing system includes at least a first cache memory supporting a first processing unit and a second cache memory supporting a second processing unit. The first cache memory includes a cache array and a cache directory of contents of the cache array. In response to the first cache memory detecting on an interconnect a broadcast operation that specifies a request address, the first cache memory determines from the operation a type of the operation and a coherency state associated with the request address. In response to determining the type and the coherency state, the first cache memory filters out the broadcast operation without accessing the cache directory.
摘要:
A cache memory logically associates a cache line with at least two cache sectors of a cache array wherein different sectors have different output latencies and, for a load hit, selectively enables the cache sectors based on their latency to output the cache line over successive clock cycles. Larger wires having a higher transmission speed are preferably used to output the cache line corresponding to the requested memory block. In the illustrative embodiment the cache is arranged with rows and columns of the cache sectors, and a given cache line is spread across sectors in different columns, with at least one portion of the given cache line being located in a first column having a first latency, and another portion of the given cache line being located in a second column having a second latency greater than the first latency. One set of wires oriented along a horizontal direction may be used to output the cache line, while another set of wires oriented along a vertical direction may be used for maintenance of the cache sectors. A given cache line is further preferably spread across sectors in different rows or cache ways. For example, a cache line can be 128 bytes and spread across four sectors in four different columns, each sector containing 32 bytes of the cache line, and the cache line is output over four successive clock cycles with one sector being transmitted during each of the four cycles.
摘要:
A processing unit for a multiprocessor data processing system includes a processor core including a store-through upper level cache, an instruction sequencing unit that fetches instructions for execution, a data register, and at least one instruction execution unit. The instruction execution unit, responsive to receipt of a load-reserve instruction from the instruction sequencing unit, executes the load-reserve instruction to determine a load target address. The processor core, responsive to the execution of the load-reserve instruction, performs a corresponding load-reserve operation by accessing the store-through upper level cache utilizing the load target address to cause data associated with the load target address to be loaded from the store-through upper level cache into the data register and by establishing a reservation for a reservation granule including the load target address.
摘要:
A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.
摘要:
A method, apparatus, and computer program product are disclosed in a processor for dynamically, during runtime, allocating memory for in-memory hardware tracing. The processor is included within a data processing system. The processor includes multiple processing units that are coupled together utilizing a system bus. The processing units include a memory controller that controls a system memory. A particular size of the system memory is determined that is needed for storing trace data. A hardware trace facility requests, dynamically after the data processing system has completed booting, the particular size of the system memory to be allocated to the hardware trace facility for storing trace data that is captured by the hardware trace facility. The firmware selects particular locations within the system memory. All of the particular locations together are the particular size. The firmware allocates the particular locations for use exclusively by the hardware trace facility.
摘要:
A data processing system includes a first processing node and a second processing node. The first processing node includes a plurality of first processing units coupled to each other for communication, and the second processing node includes a plurality of second processing units coupled to each other for communication. Each of the plurality of first processing units is coupled to a respective one of the plurality of second processing units in the second processing node by a respective one of a plurality of point-to-point links.
摘要:
A method, apparatus, and computer program product are disclosed for, in a processor, concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers. A hardware trace facility captures hardware trace data in a processor. The hardware trace facility is included within the processor. The hardware trace data is transmitted to a system memory utilizing a system bus. The system memory is included within the system. The system bus is capable of being utilized by processing units included in the processing node while the hardware trace data is being transmitted to the system bus. Part of system memory is utilized to store the trace data. The system memory is capable of being accessed by processing units in the processing node other than the hardware trace facility while part of the system memory is being utilized to store the trace data.
摘要:
A method, apparatus, and computer program product are disclosed for performing in-memory hardware tracing in a processor using an existing system bus. The processor includes multiple processing units that are coupled together utilizing the system bus. The processing units include a memory controller that controls a system memory. Information is transmitted among the processing units utilizing the system bus. The information is formatted according to a standard system bus protocol. Hardware trace data is captured utilizing a hardware trace facility that is coupled directly to the system bus. The system bus is utilized for transmitting the hardware trace data to the memory controller for storage in the system memory. The memory controller is coupled directly to the system bus. The hardware trace data is formatted according to the standard system bus protocol for transmission via the system bus.