PRODUCER-TO-CONSUMER ACTIVE DIRECT CACHE TRANSFERS

    公开(公告)号:US20210064529A1

    公开(公告)日:2021-03-04

    申请号:US16560217

    申请日:2019-09-04

    Applicant: XILINX, INC.

    Abstract: The embodiments herein creates DCT mechanisms that initiate a DCT at the time the updated data is being evicted from the producer cache. These DCT mechanisms are applied when the producer is replacing the updated contents in its cache because the producer has either moved on to working on a different data set (e.g., a different task) or moved on to working on a different function, or when the producer-consumer task manager (e.g., a management unit) enforces software coherency by sending Cache Maintenance Operations (CMO). One advantage of the DCT mechanism is that because the direct cache transfer takes place at the time the updated data is being evicted, by the time the consumer begins its task, the updated contents have already been placed in its own cache or another cache within the cache hierarchy.

    DISAGGREGATED SWITCH CONTROL PATH WITH DIRECT-ATTACHED DISPATCH

    公开(公告)号:US20210382838A1

    公开(公告)日:2021-12-09

    申请号:US16894446

    申请日:2020-06-05

    Applicant: XILINX, INC.

    Abstract: Embodiments herein describe techniques for separating data transmitted between I/O functions in an integrated component and a host into separate data paths. In one embodiment, data packets are transmitted using a direct data path that bypasses a switch in the integrated component. In contrast, configuration packets (e.g., hot-swap, hot-add, hot-remove data, some types of descriptors, etc.) are transmitted to the switch which then forwards the configuration packets to their destination. The direct path for the data packets does not rely on switch connectivity (and its accompanying latency) to transport bandwidth sensitive traffic between the host and the I/O functions, and instead avoids (e.g., bypasses) the bandwidth, resource, store/forward, and latency properties of the switch. Meanwhile, the software compatibility attributes, such as hot plug attributes (which are not latency or bandwidth sensitive), continue to be supported by using the switch to provide a configuration data path.

    MACHINE LEARNING MODEL UPDATES TO ML ACCELERATORS

    公开(公告)号:US20230195684A1

    公开(公告)日:2023-06-22

    申请号:US18112362

    申请日:2023-02-21

    Applicant: XILINX, INC.

    Abstract: Examples herein describe a peripheral I/O device with a hybrid gateway that permits the device to have both I/O and coherent domains. As a result, the compute resources in the coherent domain of the peripheral I/O device can communicate with the host in a similar manner as CPU-to-CPU communication in the host. The dual domains in the peripheral I/O device can be leveraged for machine learning (ML) applications. While an I/O device can be used as an ML accelerator, these accelerators previously only used an I/O domain. In the embodiments herein, compute resources can be split between the I/O domain and the coherent domain where a ML engine is in the I/O domain and a ML model is in the coherent domain. An advantage of doing so is that the ML model can be coherently updated using a reference ML model stored in the host.

    HARDWARE COHERENT COMPUTATIONAL EXPANSION MEMORY

    公开(公告)号:US20220100523A1

    公开(公告)日:2022-03-31

    申请号:US17035484

    申请日:2020-09-28

    Applicant: XILINX, INC.

    Abstract: Embodiments herein describe transferring ownership of data (e.g., cachelines or blocks of data comprising multiple cachelines) from a host to hardware in an I/O device. In one embodiment, the host and I/O device (e.g., an accelerator) are part of a cache-coherent system where ownership of data can be transferred from a home agent (HA) in the host to a local HA in the I/O device—e.g., a computational slave agent (CSA). That way, a function on the I/O device (e.g., an accelerator function) can request data from the local HA without these requests having to be sent to the host HA. Further, the accelerator function can indicate whether the local HA tracks the data on a cacheline-basis or by a data block (e.g., multiple cachelines). This provides flexibility that can reduce overhead from tracking the data, depending on the function's desired use of the data.

    LOW-LATENCY ALIGNED MODULES FOR DATA STREAMS

    公开(公告)号:US20250111119A1

    公开(公告)日:2025-04-03

    申请号:US18375342

    申请日:2023-09-29

    Abstract: A multi-chiplet system includes a first chiplet comprising a first transceiver and a first chiplet-to-chiplet (C2C) interface module, and a second chiplet comprising programmable logic circuitry and a second C2C interface module. The first transceiver is configured to generate a clock, which is transmitted from the first C2C interface module to the second C2C interface module, through a clock transmission wire, for data transfer between the first chiplet and the second chiplet.

    SYSTEM-LEVEL TECHNIQUES FOR ERROR CORRECTION IN CHIP-TO-CHIP INTERFACES

    公开(公告)号:US20250030500A1

    公开(公告)日:2025-01-23

    申请号:US18223517

    申请日:2023-07-18

    Applicant: XILINX, INC.

    Abstract: Some examples described herein provide for interconnect in chiplet systems, for example system-level techniques for error correction in chip-to-chip interfaces. In an example, a method of error correction includes receiving, at a first chiplet, a data message via a set of interconnect, and transmitting a first control message that requests retransmission of the data message based on detecting an error associated with receiving the data message. The method also includes transmitting one or more instances of a second control message that indicates an idle operation at the first chiplet until the first chiplet receives a third control message that triggers an end of a retransmission mode. The method also includes transmitting a fourth control message frame indicating the end of the retransmission mode, and receiving a retransmission of the data message from the second chiplet.

Patent Agency Ranking