摘要:
A technique to perform a fast compare-exchange operation is disclosed. More specifically, a machine-readable medium, processor, and system are described that implement a fast compare-exchange operation as well as a cache line mark operation that enables the fast compare-exchange operation.
摘要:
In some embodiments a light transceiver is associated with a computing rack and is adapted to transmit and/or receive one or more light beams via air to and/or from a second light transceiver associated with a second computing rack to communicate information between the computing rack and the second computing rack. Other embodiments are described and claimed.
摘要:
Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.
摘要:
A processor includes at least one power domain, each power domain including at least one core that switchably receives power supply from a voltage regulator and switchably receives a clock signal from a clock source, a cache, and at least one control registers having stored thereon data indicating power management states of the at least one power domain and the cache.
摘要:
In one embodiment, a multi-core processor having cores each associated with a cache memory, can operate such that when a first core is to access data owned by a second core present in a cache line associated with the second core, responsive to a request from the first core, cache coherency state information associated with the cache line is not updated. A coherence engine associated with the processor may receive the data access request and determine that the data is of a memory page owned by the first core and convert the data access request to a non-cache coherent request. Other embodiments are described and claimed.
摘要:
In one embodiment, a multi-core processor having cores each associated with a cache memory, can operate such that when a first core is to access data owned by a second core present in a cache line associated with the second core, responsive to a request from the first core, cache coherency state information associated with the cache line is not updated. A coherence engine associated with the processor may receive the data access request and determine that the data is of a memory page owned by the first core and convert the data access request to a non-cache coherent request. Other embodiments are described and claimed.
摘要:
In one embodiment, a multi-core processor having cores each associated with a cache memory, can operate such that when a first core is to access data owned by a second core present in a cache line associated with the second core, responsive to a request from the first core, cache coherency state information associated with the cache line is not updated. A coherence engine associated with the processor may receive the data access request and determine that the data is of a memory page owned by the first core and convert the data access request to a non-cache coherent request. Other embodiments are described and claimed.
摘要:
In one embodiment, a multi-core processor having cores each associated with a cache memory, can operate such that when a first core is to access data owned by a second core present in a cache line associated with the second core, responsive to a request from the first core, cache coherency state information associated with the cache line is not updated. A coherence engine associated with the processor may receive the data access request and determine that the data is of a memory page owned by the first core and convert the data access request to a non-cache coherent request. Other embodiments are described and claimed.
摘要:
In one embodiment, a processor includes decode circuitry and memory offload circuitry. The decode circuitry decodes an instruction to perform a direct memory access (DMA) operation, which includes an opcode and one or more fields. The opcode indicates a type of DMA operation to be performed. The one or more fields indicate a destination memory region and one or more data operands. The memory offload circuitry offloads the instruction from an execution pipeline and performs the DMA operation.
摘要:
A collective communication apparatus and method for parallel computing systems. For example, one embodiment of an apparatus comprises a plurality of processor elements (PEs); collective interconnect logic to dynamically form a virtual collective interconnect (VCI) between the PEs at runtime without global communication among all of the PEs, the VCI defining a logical topology between the PEs in which each PE is directly communicatively coupled to a only a subset of the remaining PEs; and execution logic to execute collective operations across the PEs, wherein one or more of the PEs receive first results from a first portion of the subset of the remaining PEs, perform a portion of the collective operations, and provide second results to a second portion of the subset of the remaining PEs.