-
公开(公告)号:US20210110269A1
公开(公告)日:2021-04-15
申请号:US17129590
申请日:2020-12-21
Applicant: Intel Corporation
Inventor: Sameh Gobriel , Jesmin Jahari Tithi , Tsung-Yuan Tai
Abstract: Neural network dense layer sparsification and matrix compression is disclosed. An example of an apparatus includes one or more processors; a memory to store data for processing, including data for processing of a deep neural network (DNN) including one or more layers, each layer including a plurality of neurons, the one or more processors to perform one or both of sparsification of one or more layers of the DNN, including selecting a subset of the plurality of neurons of a first layer of the DNN for activation based at least in part on locality sensitive hashing of inputs to the first layer; or compression of a weight or activation matrix of one or more layers of the DNN, including detection of sparsity patterns in a matrix of the first layer of the DNN based at least in part on locality sensitive hashing of patterns in the matrix.
-
公开(公告)号:US10394784B2
公开(公告)日:2019-08-27
申请号:US15389218
申请日:2016-12-22
Applicant: INTEL CORPORATION
Inventor: Byron Marohn , Christian Maciocco , Sameh Gobriel , Ren Wang , Wei Shen , Tsung-Yuan Charlie Tai , Saikrishna Edupuganti
IPC: G06F16/22 , G06F3/06 , G06F16/00 , H04L12/701
Abstract: Technologies for managing lookup tables are described. The lookup tables may be used for a two-level lookup scheme for packet processing. When the tables need to be updated with a new key for packet processing, information about the new key may be added to a first-level lookup table and a second-level lookup table. The first-level lookup table may be used to identify a handling node for an obtained packet, and the handling node may perform a second-level table lookup to obtain information for further packet processing. The first lookup table may be replicated on all the nodes in a cluster, and the second-level lookup table may be unique to each node in the cluster. Other embodiments are described herein and claimed.
-
公开(公告)号:US10237171B2
公开(公告)日:2019-03-19
申请号:US15270377
申请日:2016-09-20
Applicant: INTEL CORPORATION
Inventor: Sameh Gobriel , Ren Wang , Eric K. Mann , Christian Maciocco , Tsung-Yuan C. Tai
IPC: H04L1/00 , H04L12/721 , H04L12/851 , H04L12/863 , H04L12/751 , H04L12/715
Abstract: Methods and apparatus for facilitating efficient Quality of Service (QoS) support for software-based packet processing by offloading QoS rate-limiting to NIC hardware. Software-based packet processing is performed on packet flows received at a compute platform, such as a general purpose server, and/or packet flows generated by local applications running on the compute platform. The packet processing includes packet classification that associates packets with packet flows using flow IDs, and identifying a QoS class for the packet and packet flow. NIC Tx queues are dynamically configured or pre-configured to effect rate limiting for forwarding packets enqueued in the NIC Tx queues. New packet flows are detected, and mapping data is created to map flow IDs associated with flows to the NIC Tx queues used to forward the packets associated with the flows.
-
公开(公告)号:US09866479B2
公开(公告)日:2018-01-09
申请号:US14750921
申请日:2015-06-25
Applicant: Intel Corporation
Inventor: Ren Wang , Dong Zhou , Bruce Richardson , George W. Kennedy , Christian Maciocco , Sameh Gobriel , Tsung-Yuan C. Tai
IPC: H04L12/743 , H04L12/851 , H04L12/819
CPC classification number: H04L45/7453 , H04L47/21 , H04L47/2483
Abstract: Technologies for supporting concurrency of a flow lookup table at a network device. The flow lookup table includes a plurality of candidate buckets that each includes one or more entries. The network device includes a flow lookup table write module configured to perform a displacement operation of a key/value pair to move the key/value pair from one bucket to another bucket via an atomic instruction and increment a version counter associated with the buckets affected by the displacement operation. The network device additionally includes a flow lookup table read module to check the version counters during a lookup operation on the flow lookup table to determine whether a displacement operation is affecting the presently read value of the buckets. Other embodiments are described herein and claimed.
-
25.
公开(公告)号:US20170039144A1
公开(公告)日:2017-02-09
申请号:US14820802
申请日:2015-08-07
Applicant: Intel Corporation
Inventor: Ren Wang , Kevin B. Theobald , Sameh Gobriel , Tsung-Yuan C. Tai
CPC classification number: G06F12/123 , G06F12/0875 , G06F12/126 , G06F2212/1021 , G06F2212/30 , G06F2212/452 , G06F2212/602
Abstract: In one embodiment, a processor includes a core to execute instructions, a cache memory coupled to the core, and a cache controller coupled to the cache memory. The cache controller, responsive to a first load request having a first priority level, is to insert data of the first load request into a first entry of the cache memory and set an age indicator of a metadata field of the first entry to a first age level, the first age level greater than a default age level of a cache insertion policy for load requests, and responsive to a second load request having a second priority level to insert data of the second load request into a second entry of the cache memory and to set an age indicator of a metadata field of the second entry to the default age level, the first and second load requests of a first thread. Other embodiments are described and claimed.
Abstract translation: 在一个实施例中,处理器包括执行指令的核心,耦合到核心的高速缓冲存储器以及耦合到高速缓存存储器的高速缓存控制器。 高速缓存控制器响应于具有第一优先级的第一加载请求,将第一加载请求的数据插入到高速缓冲存储器的第一条目中,并将第一条目的元数据字段的年龄指示符设置为第一年龄 级别,所述第一年龄级别大于用于加载请求的高速缓存插入策略的默认年龄级别,并响应于具有第二优先级的第二加载请求,以将所述第二加载请求的数据插入所述高速缓存存储器的第二条目;以及 将第二条目的元数据字段的年龄指示符设置为默认年龄级别,第一线程的第一和第二加载请求。 描述和要求保护其他实施例。
-
公开(公告)号:US20250094712A1
公开(公告)日:2025-03-20
申请号:US18965267
申请日:2024-12-02
Applicant: Intel Corporation
Inventor: Gopi Krishna Jha , Sameh Gobriel , Nilesh Jain
IPC: G06F40/284 , G06F16/28
Abstract: Key-value (KV) caching accelerates inference in large language models (LLMs) by allowing the attention operation to scale linearly rather than quadratically with the total sequence length. Due to large context lengths in modern LLMs, KV cache size can exceed the model size, which can negatively impact throughput. To address this issue, a multi-granular clustering-based solution for KV cache compression can be implemented. Key tensors and value tensors corresponding unimportant tokens can be approximated using clusters created at different clustering-levels with varying accuracy. Accuracy loss can be mitigated by using proxies produced at finer granularity clustering-level for a subset of attention heads that are more significant. More significant attention heads can have a higher impact on model accuracy than less significant attention heads. Latency is improved by retrieving proxies from a faster memory for a subset of attention heads that are less significant, when impact on accuracy is lower.
-
公开(公告)号:US11811660B2
公开(公告)日:2023-11-07
申请号:US17396553
申请日:2021-08-06
Applicant: Intel Corporation
Inventor: Ren Wang , Tsung-Yuan C. Tai , Yipeng Wang , Sameh Gobriel
IPC: H04L45/7453 , H04L47/2441 , H04L45/745 , H04L61/10 , H04L61/5046 , H04L45/02
CPC classification number: H04L45/7453 , H04L45/745 , H04L47/2441 , H04L61/10 , H04L45/02 , H04L61/5046
Abstract: Apparatus, methods, and systems for tuple space search-based flow classification using cuckoo hash tables and unmasked packet headers are described herein. A device can communicate with one or more hardware switches. The device can include memory to store hash table entries of a hash table. The device can include processing circuitry to perform a hash lookup in the hash table. The lookup can be based on an unmasked key include in a packet header corresponding to a received data packet. The processing circuitry can retrieve an index pointing to a sub-table, the sub-table including a set of rules for handling the data packet. Other embodiments are also described.
-
公开(公告)号:US11201940B2
公开(公告)日:2021-12-14
申请号:US15862311
申请日:2018-01-04
Applicant: Intel Corporation
Inventor: Yipeng Wang , Ren Wang , Antonio Fischetti , Sameh Gobriel , Tsung-Yuan C. Tai
Abstract: Technologies for flow rule aware exact match cache compression include multiple computing devices in communication over a network. A computing device reads a network packet from a network port and extracts one or more key fields from the packet to generate a lookup key. The key fields are identified by a key field specification of an exact match flow cache. The computing device may dynamically configure the key field specification based on an active flow rule set. The computing device may compress the key field specification to match a union of non-wildcard fields of the active flow rule set. The computing device may expand the key field specification in response to insertion of a new flow rule. The computing device looks up the lookup key in the exact match flow cache and, if a match is found, applies the corresponding action. Other embodiments are described and claimed.
-
公开(公告)号:US11088951B2
公开(公告)日:2021-08-10
申请号:US15638102
申请日:2017-06-29
Applicant: Intel Corporation
Inventor: Ren Wang , Tsung-Yuan C. Tai , Yipeng Wang , Sameh Gobriel
IPC: H04L12/743 , H04L12/851 , H04L12/741 , H04L29/12 , H04L12/751
Abstract: Apparatus, methods, and systems for tuple space search-based flow classification using cuckoo hash tables and unmasked packet headers are described herein. A device can communicate with one or more hardware switches. The device can include memory to store hash table entries of a hash table. The device can include processing circuitry to perform a hash lookup in the hash table. The lookup can be based on an unmasked key include in a packet header corresponding to a received data packet. The processing circuitry can retrieve an index pointing to a sub-table, the sub-table including a set of rules for handling the data packet. Other embodiments are also described.
-
公开(公告)号:US10938712B2
公开(公告)日:2021-03-02
申请号:US15433758
申请日:2017-02-15
Applicant: Intel Corporation
Inventor: Ken Schumm , Sameh Gobriel , Asif H. Haswarey , Tsung-Yuan Charlie Tai
IPC: H04L12/703 , H04L12/715 , H04L12/741
Abstract: Apparatus and method to facilitate networked compute node cluster routing are disclosed herein. In some embodiments, a compute node for cluster compute may include one or more input ports to receive data packets from first selected ones of a cluster of compute nodes; one or more output ports to route data packets to second selected ones of the cluster of computer nodes; and one or more processors, wherein the one or more processors includes logic to determine a particular output port, of the one or more output ports, to which a data packet received at the one or more input ports is to be routed, and wherein the logic is to exclude output ports associated with links indicated in fault status information as having a fault status to be the particular output port to which the data packet is to be routed.
-
-
-
-
-
-
-
-
-