-
公开(公告)号:US11093394B1
公开(公告)日:2021-08-17
申请号:US16560435
申请日:2019-09-04
Applicant: XILINX, INC.
Inventor: Millind Mittal , Jaideep Dastidar
IPC: G06F12/0815
Abstract: An example Cache-Coherent Non-Uniform Memory Access (CC-NUMA) system includes: one or more fabric switches; a home agent coupled to the one or more fabric switches; first and second response agents coupled to the fabric switches; wherein the home agent is configured to send a delegated snoop message to the first response agent, the delegated snoop message instructing the first response agent to snoop the second response agent; wherein the first response agent is configured to snoop the second response agent in response to the delegated snoop message; and wherein the first and second response agents are configured to perform a cache-to-cache transfer during the snoop.
-
公开(公告)号:US10817462B1
公开(公告)日:2020-10-27
申请号:US16396540
申请日:2019-04-26
Applicant: Xilinx, Inc.
Inventor: Jaideep Dastidar , Millind Mittal
Abstract: Examples herein describe a peripheral I/O device with a hybrid gateway that permits the device to have both I/O and coherent domains. As a result, the compute resources in the coherent domain of the peripheral I/O device can communicate with the host in a similar manner as CPU-to-CPU communication in the host. The dual domains in the peripheral I/O device can be leveraged for machine learning (ML) applications. While an I/O device can be used as an ML accelerator, these accelerators previously only used an I/O domain. In the embodiments herein, compute resources can be split between the I/O domain and the coherent domain where a ML engine is in the I/O domain and a ML model is in the coherent domain. An advantage of doing so is that the ML model can be coherently updated using a reference ML model stored in the host.
-
公开(公告)号:US11271860B1
公开(公告)日:2022-03-08
申请号:US16686067
申请日:2019-11-15
Applicant: XILINX, INC.
Inventor: Millind Mittal , Jaideep Dastidar
IPC: G06F15/16 , H04L47/2425 , H04L49/9057
Abstract: An example cache-coherent packetized network system includes: a home agent; a snooped agent; and a request agent configured to send, to the home agent, a request message for a first address, the request message having a first transaction identifier of the request agent; where the home agent is configured to send, to the snooped agent, a snoop request message for the first address, the snoop request message having a second transaction identifier of the home agent; and where the snooped agent is configured to send a data message to the request agent, the data message including a first compressed tag generated using a function based on the first address.
-
公开(公告)号:US10664422B1
公开(公告)日:2020-05-26
申请号:US16527504
申请日:2019-07-31
Applicant: XILINX, INC.
Inventor: Millind Mittal , Jaideep Dastidar
Abstract: Various implementations of a multi-chip system operable according to a predefined transport protocol are disclosed. In one embodiment, a system comprises a first IC comprising a processing element communicatively coupled with first physical ports. The system further comprises a second IC comprising second physical ports communicatively coupled with a first set of the first physical ports via first physical links, and one or more memory devices that are communicatively coupled with the second physical ports and accessible by the processing element via the first physical links. The first IC further comprises a data structure describing a first level of port aggregation to be applied across the first set. The second IC comprises a first distribution function configured to provide ordering to data communicated using the second physical ports. The first distribution function is based on the first level of port aggregation.
-
公开(公告)号:US10409743B1
公开(公告)日:2019-09-10
申请号:US16024500
申请日:2018-06-29
Applicant: Xilinx, Inc.
Inventor: Millind Mittal , Jaideep Dastidar
Abstract: Various implementations of a multi-chip system operable according to a predefined transport protocol are disclosed. In one embodiment, a system comprises a first IC comprising a memory controller communicatively coupled with first physical ports. The system further comprises a second IC comprising second physical ports communicatively coupled with a first set of the first physical ports via first physical links, and one or more memory devices that are communicatively coupled with the second physical ports and accessible by the memory controller via the first physical links. The first IC further comprises an identification map table describing a first level of port aggregation to be applied across the first set. The second IC comprises a first distribution function configured to provide ordering to data communicated using the second physical ports. The first distribution function is based on the first level of port aggregation.
-
公开(公告)号:US11563639B1
公开(公告)日:2023-01-24
申请号:US16025762
申请日:2018-07-02
Applicant: Xilinx, Inc.
Inventor: Millind Mittal , Jaideep Dastidar
IPC: H04L41/12 , H04L41/0803
Abstract: In an example, a system specifies a first configuration of the physical transport network that models a plurality of devices as a corresponding first plurality of nodes having a tree topology. Each node of the first plurality of nodes has at least one first device identifier and at least one first connection identifier to other nodes in the tree topology. The system specifies a second configuration of the logical transport network that models the plurality of devices as the first plurality of nodes having a non-tree topology. Each node of the first plurality of nodes has at least one second device identifier, at least one second connection identifier to other nodes in the non-tree topology, the at least one first device identifier, and the at least one first connection identifier of the tree topology. The system folds the logical transport network over the physical transport network using the at least one second device identifier, at least one second connection identifier to other nodes in the non-tree topology, the at least one first device identifier, and the at least one first connection identifier of the tree topology.
-
公开(公告)号:US11474871B1
公开(公告)日:2022-10-18
申请号:US16582958
申请日:2019-09-25
Applicant: XILINX, INC.
Inventor: Millind Mittal , Jaideep Dastidar
IPC: G06F9/50 , G06F12/0815 , G06F9/455 , G06F9/38
Abstract: The embodiments herein describe a virtualization framework for cache coherent accelerators where the framework incorporates a layered approach for accelerators in their interactions between a cache coherent protocol layer and the functions performed by the accelerator. In one embodiment, the virtualization framework includes a first layer containing the different instances of accelerator functions (AFs), a second layer containing accelerator function engines (AFE) in each of the AFs, and a third layer containing accelerator function threads (AFTs) in each of the AFEs. Partitioning the hardware circuitry using multiple layers in the virtualization framework allows the accelerator to be quickly re-provisioned in response to requests made by guest operation systems or virtual machines executing in a host. Further, using the layers to partition the hardware permits the host to re-provision sub-portions of the accelerator while the remaining portions of the accelerator continue to operate as normal.
-
公开(公告)号:US11375050B1
公开(公告)日:2022-06-28
申请号:US17019039
申请日:2020-09-11
Applicant: XILINX, INC.
Inventor: Millind Mittal , Jaideep Dastidar , Kiran Puranik
IPC: H04L69/18
Abstract: Embodiments herein describe a layer converter that includes a proxy legacy interface that permits the layers for a legacy interconnect protocol to be recycled without any modifications, thus achieving legacy functionality alongside the new protocols' layer implementation. Put differently, the layer converter permits the layers of the legacy interconnect protocol to be reused to permit data to be transmitted on a link shared with data transmitted using a new interconnect protocol.
-
公开(公告)号:US11113194B2
公开(公告)日:2021-09-07
申请号:US16560217
申请日:2019-09-04
Applicant: XILINX, INC.
Inventor: Jaideep Dastidar , Millind Mittal
IPC: G06F12/0811 , G06F12/0804 , G06F12/121
Abstract: The embodiments herein creates DCT mechanisms that initiate a DCT at the time the updated data is being evicted from the producer cache. These DCT mechanisms are applied when the producer is replacing the updated contents in its cache because the producer has either moved on to working on a different data set (e.g., a different task) or moved on to working on a different function, or when the producer-consumer task manager (e.g., a management unit) enforces software coherency by sending Cache Maintenance Operations (CMO). One advantage of the DCT mechanism is that because the direct cache transfer takes place at the time the updated data is being evicted, by the time the consumer begins its task, the updated contents have already been placed in its own cache or another cache within the cache hierarchy.
-
公开(公告)号:US20200042446A1
公开(公告)日:2020-02-06
申请号:US16053488
申请日:2018-08-02
Applicant: Xilinx, Inc.
Inventor: Millind Mittal , Jaideep Dastidar
IPC: G06F12/0815
Abstract: Circuits and methods for combined precise and imprecise snoop filtering. A memory and a plurality of processors are coupled to the interconnect circuitry. A plurality of cache circuits are coupled to the plurality of processor circuits, respectively. A first snoop filter is coupled to the interconnect and is configured to filter snoop requests by individual cache lines of a first subset of addresses of the memory. A second snoop filter is coupled to the interconnect and is configured to filter snoop requests by groups of cache lines of a second subset of addresses of the memory. Each group encompasses a plurality of cache lines.
-
-
-
-
-
-
-
-
-