MECHANISMS TO IMPROVE DATA LOCALITY FOR DISTRIBUTED GPUS

    公开(公告)号:US20180115496A1

    公开(公告)日:2018-04-26

    申请号:US15331002

    申请日:2016-10-21

    Abstract: Systems, apparatuses, and methods for implementing mechanisms to improve data locality for distributed processing units are disclosed. A system includes a plurality of distributed processing units (e.g., GPUs) and memory devices. Each processing unit is coupled to one or more local memory devices. The system determines how to partition a workload into a plurality of workgroups based on maximizing data locality and data sharing. The system determines which subset of the plurality of workgroups to dispatch to each processing unit of the plurality of processing units based on maximizing local memory accesses and minimizing remote memory accesses. The system also determines how to partition data buffer(s) based on data sharing patterns of the workgroups. The system maps to each processing unit a separate portion of the data buffer(s) so as to maximize local memory accesses and minimize remote memory accesses.

    Source-Side Resource Request Network Admission Control
    15.
    发明申请
    Source-Side Resource Request Network Admission Control 审中-公开
    源端资源请求网络接纳控制

    公开(公告)号:US20160359973A1

    公开(公告)日:2016-12-08

    申请号:US14730837

    申请日:2015-06-04

    CPC classification number: H04L67/1097 G06F9/50

    Abstract: A technique for source-side memory request network admission control includes adjusting, by a first node, a rate of injection of memory requests by the first node into a network coupled to a memory system. The adjusting is based on an injection policy for the first node and memory request efficiency indicators. The method may include injecting memory requests by the first node into the network coupled to the memory system. The injecting has the rate of injection. The technique includes adjusting the rate of injection by the first node. The first node adjusts the rate of injection according to an injection policy for the first node and memory request efficiency indicators. The injection policy may be based on an injection rate limit for the first node. The injection policy for the first node may be based on an injection rate limit per memory channel for the first node. The technique may include determining the memory request efficiency indicators based on comparisons of target addresses of the memory requests to addresses of recent memory requests of the first node.

    Abstract translation: 用于源侧存储器请求网络准入控制的技术包括由第一节点将由第一节点的存储器请求的注入速率调整到耦合到存储器系统的网络中。 调整是基于第一个节点的注入策略和内存请求效率指标。 该方法可以包括将第一节点的存储器请求注入耦合到存储器系统的网络。 注射具有注射速率。 该技术包括调整第一节点的注射速率。 第一个节点根据第一个节点的注入策略和内存请求效率指标来调整注入速率。 注入策略可以基于第一节点的注入速率限制。 第一节点的注入策略可以基于针对第一节点的每个存储器信道的注入速率限制。 该技术可以包括基于存储器请求的目标地址与第一节点的最近存储器请求的地址的比较来确定存储器请求效率指示符。

    Memory hierarchy using row-based compression
    16.
    发明授权
    Memory hierarchy using row-based compression 有权
    使用基于行的压缩的内存层次结构

    公开(公告)号:US09477605B2

    公开(公告)日:2016-10-25

    申请号:US13939377

    申请日:2013-07-11

    Abstract: A system includes a first memory and a device coupleable to the first memory. The device includes a second memory to cache data from the first memory. The second memory includes a plurality of rows, each row including a corresponding set of compressed data blocks of non-uniform sizes and a corresponding set of tag blocks. Each tag block represents a corresponding compressed data block of the row. The device further includes decompression logic to decompress data blocks accessed from the second memory. The device further includes compression logic to compress data blocks to be stored in the second memory.

    Abstract translation: 系统包括第一存储器和可耦合到第一存储器的装置。 该设备包括用于缓存来自第一存储器的数据的第二存储器。 第二存储器包括多行,每行包括对应的一组非均匀尺寸的压缩数据块和相应的一组标签块。 每个标签块表示该行的对应的压缩数据块。 该设备还包括解压缩逻辑以解压缩从第二存储器访问的数据块。 该设备还包括压缩逻辑以压缩要存储在第二存储器中的数据块。

    TRAFFIC RATE CONTROL FOR INTER-CLASS DATA MIGRATION IN A MULTICLASS MEMORY SYSTEM
    18.
    发明申请
    TRAFFIC RATE CONTROL FOR INTER-CLASS DATA MIGRATION IN A MULTICLASS MEMORY SYSTEM 有权
    用于多行存储器系统中的类间数据迁移的交通费率控制

    公开(公告)号:US20160170919A1

    公开(公告)日:2016-06-16

    申请号:US14569825

    申请日:2014-12-15

    Abstract: A system includes a plurality of memory classes and a set of one or more processing units coupled to the plurality of memory classes. The system further includes a data migration controller to select a traffic rate as a maximum traffic rate for transferring data between the plurality of memory classes based on a net benefit metric associated with the traffic rate, and to enforce the maximum traffic rate for transferring data between the plurality of memory classes.

    Abstract translation: 系统包括多个存储器类别以及耦合到多个存储器类别的一个或多个处理单元的集合。 该系统还包括数据迁移控制器,用于基于与业务速率相关联的净利益度量来选择业务速率作为用于在多个存储器类之间传送数据的最大业务速率,并且执行用于在数据传输之间传送数据的最大业务速率 多个存储器类。

    SYSTEM AND METHOD FOR REPURPOSING DEAD CACHE BLOCKS
    19.
    发明申请
    SYSTEM AND METHOD FOR REPURPOSING DEAD CACHE BLOCKS 有权
    用于修复死卡块的系统和方法

    公开(公告)号:US20160085677A1

    公开(公告)日:2016-03-24

    申请号:US14491296

    申请日:2014-09-19

    CPC classification number: G06F12/0815 G06F12/0864 G06F12/0891 Y02D10/13

    Abstract: A processing system having a multilevel cache hierarchy employs techniques for repurposing dead cache blocks so as to use otherwise wasted space in a cache hierarchy employing a write-back scheme. For a cache line containing invalid data with a valid tag, the valid tag is maintained for cache coherence purposes or otherwise, resulting in a valid tag for a dead cache block. A cache controller repurposes the dead cache block by storing any of a variety of new data at the dead cache block, while storing the new tag in a tag entry of a dead block tag way with an identifier indicating the location of the new data.

    Abstract translation: 具有多级高速缓存层级的处理系统采用用于重新利用死缓存块的技术,以便在采用回写方案的高速缓存层级中使用另外浪费的空间。 对于包含具有有效标签的无效数据的高速缓存行,维护有效标记用于高速缓存一致目的或其他方式,导致死缓存块的有效标签。 高速缓存控制器通过将死缓存块中的各种新数据中的任何一个存储在死区缓存块中,同时将新标记存储在具有指示新数据的位置的标识符的死区标记方式的标签条目中来重新使用死区高速缓存块。

    Predicting outcomes for memory requests in a cache memory
    20.
    发明授权
    Predicting outcomes for memory requests in a cache memory 有权
    预测缓存中内存请求的结果

    公开(公告)号:US09235514B2

    公开(公告)日:2016-01-12

    申请号:US13736254

    申请日:2013-01-08

    CPC classification number: G06F12/0802 G06F12/0804 G06F12/0862 G06F12/0888

    Abstract: The described embodiments include a cache controller with a prediction mechanism in a cache. In the described embodiments, the prediction mechanism is configured to perform a lookup in each table in a hierarchy of lookup tables in parallel to determine if a memory request is predicted to be a hit in the cache, each table in the hierarchy comprising predictions whether memory requests to corresponding regions of a main memory will hit the cache, the corresponding regions of the main memory being smaller for tables lower in the hierarchy.

    Abstract translation: 所描述的实施例包括在高速缓存中具有预测机制的高速缓存控制器。 在所描述的实施例中,预测机制被配置为并行地在查找表的层次中的每个表中执行查找,以确定存储器请求是否被预测为高速缓存中的命中,层级中的每个表包括是否存储 对主存储器的对应区域的请求将到达高速缓存,主存储器的对应区域对于层级中较低的表来说较小。

Patent Agency Ranking