摘要:
A symmetric multi-processing (SMP) processor includes a primary interconnect trunk for communication of information between multiple compute elements situated along the primary interconnect trunk. The processor also includes a secondary interconnected trunk that may be oriented perpendicular with respect to the primary interconnect trunk. The secondary interconnect trunk communicates information off-chip via a number of I/O interfaces at the perimeter of the processor chip. The I/O interfaces may be distributed uniformly along portions of the perimeter.
摘要翻译:对称多处理(SMP)处理器包括用于在沿着主互连干线位于多个计算单元之间进行信息通信的主互连干线。 处理器还包括可相对于主互连干线垂直定向的次级互连干线。 辅助互连中继线通过处理器芯片周边的多个I / O接口来传送芯片外的信息。 I / O接口可以沿周边的部分均匀分布。
摘要:
An information handling system (IHS) includes a processor with a cache memory system. The processor includes a processor core with an L1 cache memory that couples to an L2 cache memory. The processor includes an arbitration mechanism that controls load and store requests to the L2 cache memory. The arbitration mechanism includes control logic that enables a load request to interrupt a store request that the L2 cache memory is currently servicing. The L2 cache memory includes dual data banks so that one bank may perform a load operation while the other bank performs a store operation. The cache system provides dual dispatch points into the data flow to the dual cache banks of the L2 cache memory.
摘要:
A system and method for tracking core load requests and providing arbitration and ordering of requests. When a core interface unit (CIU) receives a load operation from the processor core, a new entry in allocated in a queue of the CIU. In response to allocating the new entry in the queue, the CIU detects contention between the load request and another memory access request. In response to detecting contention, the load request may be suspended until the contention is resolved. Received load requests may be stored in the queue and tracked using a least recently used (LRU) mechanism. The load request may then be processed when the load request resides in a least recently used entry in the load request queue. CIU may also suspend issuing an instruction unless a read claim (RC) machine is available. In another embodiment, CIU may issue stored load requests in a specific priority order.
摘要:
A cache memory logically associates a cache line with at least two cache sectors of a cache array wherein different sectors have different output latencies and, for a load hit, selectively enables the cache sectors based on their latency to output the cache line over successive clock cycles. Larger wires having a higher transmission speed are preferably used to output the cache line corresponding to the requested memory block. In the illustrative embodiment the cache is arranged with rows and columns of the cache sectors, and a given cache line is spread across sectors in different columns, with at least one portion of the given cache line being located in a first column having a first latency, and another portion of the given cache line being located in a second column having a second latency greater than the first latency. One set of wires oriented along a horizontal direction may be used to output the cache line, while another set of wires oriented along a vertical direction may be used for maintenance of the cache sectors. A given cache line is further preferably spread across sectors in different rows or cache ways. For example, a cache line can be 128 bytes and spread across four sectors in four different columns, each sector containing 32 bytes of the cache line, and the cache line is output over four successive clock cycles with one sector being transmitted during each of the four cycles.
摘要:
An integrated circuit (IC) processor chip apparatus includes multiple processor chips on a substrate. At least one of the multiple processor chips includes a die with a primary interconnect trunk for communication of information between multiple compute elements situated along the primary interconnect trunk. That multiple processor chip includes a secondary interconnected trunk that may be oriented perpendicular with respect to the primary interconnect trunk. The secondary interconnect trunk communicates information off-chip via a number of I/O interfaces at the perimeter of that multiple processor chip. The I/O interfaces may be distributed uniformly along portions of the perimeter of that multiple processor chip.
摘要翻译:集成电路(IC)处理器芯片装置在基板上包括多个处理器芯片。 多个处理器芯片中的至少一个包括具有主互连干线的管芯,用于在沿着主互连干线位于的多个计算元件之间进行信息通信。 该多处理器芯片包括可相对于主互连干线垂直定向的次级互连干线。 辅助互连中继线通过多处理器芯片周边的多个I / O接口来传送芯片外的信息。 I / O接口可以沿着该多处理器芯片的周边的一部分均匀地分布。
摘要:
A system and method for cache management in a data processing system having a memory hierarchy of upper memory and lower memory cache. A lower memory cache controller accesses a coherency state table to determine replacement policies of coherency states for cache lines present in the lower memory cache when receiving a cast-in request from one of the upper memory caches. The coherency state table implements a replacement policy that retains the more valuable cache coherency state information between the upper and lower memory caches for a particular cache line contained in both levels of memory at the time of cast-out from the upper memory cache.
摘要:
An apparatus and computer program product are disclosed for performing in-memory hardware tracing in a processor using an existing system bus. The processor includes multiple processing units that are coupled together utilizing the system bus. The processing units include a memory controller that controls a system memory. Information is transmitted among the processing units utilizing the system bus. The information is formatted according to a standard system bus protocol. Hardware trace data is captured utilizing a hardware trace facility that is coupled directly to the system bus. The system bus is utilized for transmitting the hardware trace data to the memory controller for storage in the system memory. The memory controller is coupled directly to the system bus. The hardware trace data is formatted according to the standard system bus protocol for transmission via the system bus.
摘要:
A method, apparatus, and computer program product are disclosed for, in a processor, concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers. A hardware trace facility captures hardware trace data in a processor. The hardware trace facility is included within the processor. The hardware trace data is transmitted to a system memory utilizing a system bus. The system memory is included within the system. The system bus is capable of being utilized by processing units included in the processing node while the hardware trace data is being transmitted to the system bus. Part of system memory is utilized to store the trace data. The system memory is capable of being accessed by processing units in the processing node other than the hardware trace facility while part of the system memory is being utilized to store the trace data.
摘要:
A system and method for cache management in a data processing system having a memory hierarchy of upper memory and lower memory cache. A lower memory cache controller accesses a coherency state table to determine replacement policies of coherency states for cache lines present in the lower memory cache when receiving a cast-in request from one of the upper memory caches. The coherency state table implements a replacement policy that retains the more valuable cache coherency state information between the upper and lower memory caches for a particular cache line contained in both levels of memory at the time of cast-out from the upper memory cache.
摘要:
A processing unit for a multiprocessor data processing system includes a processor core including a store-through upper level cache, an instruction sequencing unit that fetches instructions for execution, a data register, and at least one instruction execution unit coupled to the instruction sequencing unit that concurrently executes multiple threads of instructions. The processor core, responsive to the at least one instruction execution unit executing a load-reserve instruction in a first thread that binds to a load target address in the store-through upper level cache during a reservation hazard window associated with a conflicting store-conditional operation of a second thread, causes a subsequent store-conditional operation of the first thread to a store target address matching the load target address to fail if the store-conditional operation of the second thread succeeds.