-
1.
公开(公告)号:US10795819B1
公开(公告)日:2020-10-06
申请号:US16453670
申请日:2019-06-26
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Bharadwaj Krishnamurthy , Vincent Cave , Jason M. Howard , Ankit More , Joshua B. Fryman
IPC: G06F12/00 , G06F12/0817 , G06F12/0811 , G06F9/38 , G06F9/30 , G06F12/0891
Abstract: Disclosed embodiments relate to a system with configurable cache sub-domains and cross-die memory coherency. In one example, a system includes R racks, each rack housing N nodes, each node incorporating D dies, each die containing C cores and a die shadow tag, each core including P pipelines and a core shadow tag, each pipelines associated with a data cache and data cache tags and being either non-coherent or coherent and one of X coherency domains, wherein each pipeline, when needing to read a cache line, issues a read request to its associated data cache, then, if need be, issues a read request to its associated core-level cache, then, if need be, issues a read request to its associated die-level cache, then, if need be, issues a no-cache remote read request to a target die being mapped to hold the cache line.
-
公开(公告)号:US10452717B2
公开(公告)日:2019-10-22
申请号:US15272976
申请日:2016-09-22
Applicant: Intel Corporation
Inventor: Ahmet Can Sitik , Ankit More
IPC: G06F16/901 , G06N20/00 , G06N5/02
Abstract: Technologies for node-degree based clustering include a computing device to construct a graph that includes multiple vertices corresponding to the data points of a data set. The computing device inserts an edge between each pair of vertices that has a corresponding similarity metric that meets a predetermined threshold similarity metric. The computing device determines a node degree for each vertex in the graph and initializes a cutoff node degree as the lowest node degree of the vertices. The computing device selects a test subset of the graph that includes vertices having a node degree less than or equal to the cutoff node degree. The computing device determines whether the test subset covers the graph and if not increases the cutoff node degree. If the test subset covers the graph, the data points corresponding to the vertices of the test subset are the representative cluster. Other embodiments are described and claimed.
-
公开(公告)号:US09992135B2
公开(公告)日:2018-06-05
申请号:US14967166
申请日:2015-12-11
Applicant: INTEL CORPORATION
Inventor: Surhud Khare , Dinesh Somasekhar , Ankit More , David S. Dunning , Nitin Y. Borkar , Shekhar Y. Borkar
IPC: H04L12/28 , H04L12/935 , H04L12/933 , H04L12/721
CPC classification number: H04L49/30 , H04L49/101 , H04L49/109 , H04L49/253
Abstract: Described is an apparatus which comprises: a Network-On-Chip fabric using crossbar switches, having distributed ingress and egress ports; and a dual-mode network interface coupled to at least one crossbar switch, the dual-mode network interface is to include: a dual-mode circuitry; a controller operable to: configure the dual-mode circuitry to transmit and receive differential signals via the egress and ingress ports, respectively, and configure the dual-mode circuitry to transmit and receive signal-ended signals via the egress and ingress ports, respectively.
-
公开(公告)号:US20180081986A1
公开(公告)日:2018-03-22
申请号:US15272976
申请日:2016-09-22
Applicant: Intel Corporation
Inventor: Ahmet Can Sitik , Ankit More
CPC classification number: G06F16/9024 , G06N5/022 , G06N20/00
Abstract: Technologies for node-degree based clustering include a computing device to construct a graph that includes multiple vertices corresponding to the data points of a data set. The computing device inserts an edge between each pair of vertices that has a corresponding similarity metric that meets a predetermined threshold similarity metric. The computing device determines a node degree for each vertex in the graph and initializes a cutoff node degree as the lowest node degree of the vertices. The computing device selects a test subset of the graph that includes vertices having a node degree less than or equal to the cutoff node degree. The computing device determines whether the test subset covers the graph and if not increases the cutoff node degree. If the test subset covers the graph, the data points corresponding to the vertices of the test subset are the representative cluster. Other embodiments are described and claimed.
-
公开(公告)号:US10983793B2
公开(公告)日:2021-04-20
申请号:US16369846
申请日:2019-03-29
Applicant: INTEL CORPORATION
Inventor: Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith
Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.
-
公开(公告)号:US09998401B2
公开(公告)日:2018-06-12
申请号:US15042402
申请日:2016-02-12
Applicant: Intel Corporation
Inventor: Surhud Khare , Ankit More , Dinesh Somasekhar , David S. Dunning
IPC: H04L25/00 , H04L12/933 , H01L23/522 , H01L23/528
CPC classification number: H04L49/109 , H01L23/5221 , H01L23/528 , H01L2924/0002 , H01L2924/00
Abstract: In an embodiment, an apparatus includes: a plurality of islands configured on a semiconductor die, each of the plurality of islands having a plurality of cores; and a plurality of network switches configured on the semiconductor die and each associated with one of the plurality of islands, where each network switch includes a plurality of output ports, a first set of the output ports are each to couple to the associated network switch of an island via a point-to-point interconnect and a second set of the output ports are each to couple to the associated network switches of a plurality of islands via a point-to-multipoint interconnect. Other embodiments are described and claimed.
-
公开(公告)号:US20170171111A1
公开(公告)日:2017-06-15
申请号:US14967166
申请日:2015-12-11
Applicant: INTEL CORPORATION
Inventor: Surhud Khare , Dinesh Somasekhar , Ankit More , David S. Dunning , Nitin Y. Borkar , Shekhar Y. Borkar
IPC: H04L12/935 , H04L12/721 , H04L12/933
CPC classification number: H04L49/30 , H04L49/101 , H04L49/109 , H04L49/253
Abstract: Described is an apparatus which comprises: a Network-On-Chip fabric using crossbar switches, having distributed ingress and egress ports; and a dual-mode network interface coupled to at least one crossbar switch, the dual-mode network interface is to include: a dual-mode circuitry; a controller operable to: configure the dual-mode circuitry to transmit and receive differential signals via the egress and ingress ports, respectively, and configure the dual-mode circuitry to transmit and receive signal-ended signals via the egress and ingress ports, respectively.
-
公开(公告)号:US09287208B1
公开(公告)日:2016-03-15
申请号:US14524622
申请日:2014-10-27
Applicant: Intel Corporation
Inventor: Surhud Khare , Ankit More , Dinesh Somasekhar , David S. Dunning
IPC: H01L25/00 , H01L23/522 , H01L23/528
CPC classification number: H04L49/109 , H01L23/5221 , H01L23/528 , H01L2924/0002 , H01L2924/00
Abstract: In an embodiment, an apparatus includes: a plurality of islands configured on a semiconductor die, each of the plurality of islands having a plurality of cores; and a plurality of network switches configured on the semiconductor die and each associated with one of the plurality of islands, where each network switch includes a plurality of output ports, a first set of the output ports are each to couple to the associated network switch of an island via a point-to-point interconnect and a second set of the output ports are each to couple to the associated network switches of a plurality of islands via a point-to-multipoint interconnect. Other embodiments are described and claimed.
Abstract translation: 在一个实施例中,一种装置包括:配置在半导体管芯上的多个岛,多个岛中的每一个具有多个核; 以及多个网络交换机,其配置在所述半导体管芯上并且各自与所述多个岛中的一个岛相关联,其中每个网络交换机包括多个输出端口,所述输出端口的第一组各自耦合到相关联的网络交换机 经由点对点互连的岛屿和第二组输出端口各自经由点对多点互连耦合到多个岛的相关网络交换机。 描述和要求保护其他实施例。
-
公开(公告)号:US12153932B2
公开(公告)日:2024-11-26
申请号:US17129555
申请日:2020-12-21
Applicant: Intel Corporation
Inventor: Ankit More , Fabrizio Petrini , Robert Pawlowski , Shruti Sharma , Sowmya Pitchaimoorthy
IPC: G06F9/4401 , G06F13/40
Abstract: Examples include techniques for an in-network acceleration of a parallel prefix-scan operation. Examples include configuring registers of a node included in a plurality of nodes on a same semiconductor package. The registers to be configured responsive to receiving an instruction that indicates a logical tree to map to a network topology that includes the node. The instruction associated with a prefix-scan operation to be executed by at least a portion of the plurality of nodes.
-
公开(公告)号:US11630691B2
公开(公告)日:2023-04-18
申请号:US17410818
申请日:2021-08-24
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Ankit More , Jason M. Howard , Joshua B. Fryman , Tina C. Zhong , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cave , Sriram Aananthakrishnan , Bharadwaj Krishnamurthy
Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.
-
-
-
-
-
-
-
-
-