NEURAL NETWORK DENSE LAYER SPARSIFICATION AND MATRIX COMPRESSION

    公开(公告)号:US20210110269A1

    公开(公告)日:2021-04-15

    申请号:US17129590

    申请日:2020-12-21

    Abstract: Neural network dense layer sparsification and matrix compression is disclosed. An example of an apparatus includes one or more processors; a memory to store data for processing, including data for processing of a deep neural network (DNN) including one or more layers, each layer including a plurality of neurons, the one or more processors to perform one or both of sparsification of one or more layers of the DNN, including selecting a subset of the plurality of neurons of a first layer of the DNN for activation based at least in part on locality sensitive hashing of inputs to the first layer; or compression of a weight or activation matrix of one or more layers of the DNN, including detection of sparsity patterns in a matrix of the first layer of the DNN based at least in part on locality sensitive hashing of patterns in the matrix.

    Technologies for management of lookup tables

    公开(公告)号:US10394784B2

    公开(公告)日:2019-08-27

    申请号:US15389218

    申请日:2016-12-22

    Abstract: Technologies for managing lookup tables are described. The lookup tables may be used for a two-level lookup scheme for packet processing. When the tables need to be updated with a new key for packet processing, information about the new key may be added to a first-level lookup table and a second-level lookup table. The first-level lookup table may be used to identify a handling node for an obtained packet, and the handling node may perform a second-level table lookup to obtain information for further packet processing. The first lookup table may be replicated on all the nodes in a cluster, and the second-level lookup table may be unique to each node in the cluster. Other embodiments are described herein and claimed.

    Efficient QoS support for software packet processing on general purpose servers

    公开(公告)号:US10237171B2

    公开(公告)日:2019-03-19

    申请号:US15270377

    申请日:2016-09-20

    Abstract: Methods and apparatus for facilitating efficient Quality of Service (QoS) support for software-based packet processing by offloading QoS rate-limiting to NIC hardware. Software-based packet processing is performed on packet flows received at a compute platform, such as a general purpose server, and/or packet flows generated by local applications running on the compute platform. The packet processing includes packet classification that associates packets with packet flows using flow IDs, and identifying a QoS class for the packet and packet flow. NIC Tx queues are dynamically configured or pre-configured to effect rate limiting for forwarding packets enqueued in the NIC Tx queues. New packet flows are detected, and mapping data is created to map flow IDs associated with flows to the NIC Tx queues used to forward the packets associated with the flows.

    LOADING DATA USING SUB-THREAD INFORMATION IN A PROCESSOR
    25.
    发明申请
    LOADING DATA USING SUB-THREAD INFORMATION IN A PROCESSOR 审中-公开
    在处理器中使用子螺纹信息加载数据

    公开(公告)号:US20170039144A1

    公开(公告)日:2017-02-09

    申请号:US14820802

    申请日:2015-08-07

    Abstract: In one embodiment, a processor includes a core to execute instructions, a cache memory coupled to the core, and a cache controller coupled to the cache memory. The cache controller, responsive to a first load request having a first priority level, is to insert data of the first load request into a first entry of the cache memory and set an age indicator of a metadata field of the first entry to a first age level, the first age level greater than a default age level of a cache insertion policy for load requests, and responsive to a second load request having a second priority level to insert data of the second load request into a second entry of the cache memory and to set an age indicator of a metadata field of the second entry to the default age level, the first and second load requests of a first thread. Other embodiments are described and claimed.

    Abstract translation: 在一个实施例中,处理器包括执行指令的核心,耦合到核心的高速缓冲存储器以及耦合到高速缓存存储器的高速缓存控制器。 高速缓存控制器响应于具有第一优先级的第一加载请求,将第一加载请求的数据插入到高速缓冲存储器的第一条目中,并将第一条目的元数据字段的年龄指示符设置为第一年龄 级别,所述第一年龄级别大于用于加载请求的高速缓存插入策略的默认年龄级别,并响应于具有第二优先级的第二加载请求,以将所述第二加载请求的数据插入所述高速缓存存储器的第二条目;以及 将第二条目的元数据字段的年龄指示符设置为默认年龄级别,第一线程的第一和第二加载请求。 描述和要求保护其他实施例。

    MULTI-GRANULAR CLUSTERING-BASED SOLUTION FOR KEY-VALUE CACHE COMPRESSION

    公开(公告)号:US20250094712A1

    公开(公告)日:2025-03-20

    申请号:US18965267

    申请日:2024-12-02

    Abstract: Key-value (KV) caching accelerates inference in large language models (LLMs) by allowing the attention operation to scale linearly rather than quadratically with the total sequence length. Due to large context lengths in modern LLMs, KV cache size can exceed the model size, which can negatively impact throughput. To address this issue, a multi-granular clustering-based solution for KV cache compression can be implemented. Key tensors and value tensors corresponding unimportant tokens can be approximated using clusters created at different clustering-levels with varying accuracy. Accuracy loss can be mitigated by using proxies produced at finer granularity clustering-level for a subset of attention heads that are more significant. More significant attention heads can have a higher impact on model accuracy than less significant attention heads. Latency is improved by retrieving proxies from a faster memory for a subset of attention heads that are less significant, when impact on accuracy is lower.

    Technologies for flow rule aware exact match cache compression

    公开(公告)号:US11201940B2

    公开(公告)日:2021-12-14

    申请号:US15862311

    申请日:2018-01-04

    Abstract: Technologies for flow rule aware exact match cache compression include multiple computing devices in communication over a network. A computing device reads a network packet from a network port and extracts one or more key fields from the packet to generate a lookup key. The key fields are identified by a key field specification of an exact match flow cache. The computing device may dynamically configure the key field specification based on an active flow rule set. The computing device may compress the key field specification to match a union of non-wildcard fields of the active flow rule set. The computing device may expand the key field specification in response to insertion of a new flow rule. The computing device looks up the lookup key in the exact match flow cache and, if a match is found, applies the corresponding action. Other embodiments are described and claimed.

    Flow classification apparatus, methods, and systems

    公开(公告)号:US11088951B2

    公开(公告)日:2021-08-10

    申请号:US15638102

    申请日:2017-06-29

    Abstract: Apparatus, methods, and systems for tuple space search-based flow classification using cuckoo hash tables and unmasked packet headers are described herein. A device can communicate with one or more hardware switches. The device can include memory to store hash table entries of a hash table. The device can include processing circuitry to perform a hash lookup in the hash table. The lookup can be based on an unmasked key include in a packet header corresponding to a received data packet. The processing circuitry can retrieve an index pointing to a sub-table, the sub-table including a set of rules for handling the data packet. Other embodiments are also described.

    Compute node cluster based routing method and apparatus

    公开(公告)号:US10938712B2

    公开(公告)日:2021-03-02

    申请号:US15433758

    申请日:2017-02-15

    Abstract: Apparatus and method to facilitate networked compute node cluster routing are disclosed herein. In some embodiments, a compute node for cluster compute may include one or more input ports to receive data packets from first selected ones of a cluster of compute nodes; one or more output ports to route data packets to second selected ones of the cluster of computer nodes; and one or more processors, wherein the one or more processors includes logic to determine a particular output port, of the one or more output ports, to which a data packet received at the one or more input ports is to be routed, and wherein the logic is to exclude output ports associated with links indicated in fault status information as having a fault status to be the particular output port to which the data packet is to be routed.

Patent Agency Ranking