摘要:
An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
摘要:
In an embodiment, an apparatus includes: a first downstream port to couple to a first peer device; a second downstream port to couple to a second peer device; and a peer-to-peer (PTP) circuit to receive a memory access request from the first peer device, the memory access request having a target associated with the second peer device, where the PTP circuit is to convert the memory access request from a coherent protocol to a memory protocol and send the converted memory access request to the second peer device. Other embodiments are described and claimed.
摘要:
Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.
摘要:
In one embodiment, a method includes: receiving, in a root tile of an accelerator device having a plurality of tiles, a message from a processor, the message comprising a register write request to a register of a first remote tile of the plurality of remote tiles; decoding, in an endpoint controller of the root tile, a system address of the message to identify a destination tile for the message, based at least in part on a base address register decode of the system address; and in response to identifying the first remote tile as the destination tile, updating a first portion of an address offset field of the system address to a predetermined value and directing the message to the first remote tile coupled to the root tile via a sideband interconnect. Other embodiments are described and claimed.
摘要:
In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
摘要:
In many cases, processors may change frequency sufficiently often to result in significant performance and power consumption losses. These performance and power consumption losses may be mitigated by changing the frequency using a squashing technique rather than using a phase locked loop technique. The squashing technique involves simply eliminated clock pulses to reduce the frequency. This can be done more quickly, resulting in less overhead in some cases.
摘要:
A processor may operate at a first frequency level for a first time interval. The processor automatically may transition to a sleep state from the first frequency level after the first time interval. Then the processor automatically transitions from the sleep state to the first frequency level after a second time interval. As a result the processor may operate at a reduced power consumption and higher performance.
摘要:
A memory controller hub includes a graphics subsystem adapted to perform graphics operations on data, and interface circuitry adapted selectively to couple the graphics subsystem to a local memory through electrical connectors and to couple the memory controller hub to a graphics controller through the electrical connectors.
摘要:
A number of symbols are received in a first integrated circuit (IC) device, where these symbols have been transmitted by a second IC device and are received over a serial point to point link. These symbols include a non-data sequence that has been inserted into a data sequence by the second device. The symbols are loaded into a buffer. The data sequence and some of the non-data sequence is unloaded from the buffer, according to a changing unload pointer. To prevent overflow of the buffer, and in response to detecting the non-data sequence, the unload pointer is changed by more than one entry so that a non-data symbol of the non-data sequence as loaded in the buffer is skipped while unloading from the buffer. In another embodiment, to prevent underflow of the buffer, the unload pointer is stalled at an entry of the buffer that contains a non-data symbol while unloading. Other embodiments are also described and claimed.
摘要:
An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.