Tracking Non-Native Content in Caches
    41.
    发明申请
    Tracking Non-Native Content in Caches 审中-公开
    跟踪缓存中的非本地内容

    公开(公告)号:US20140156941A1

    公开(公告)日:2014-06-05

    申请号:US13691375

    申请日:2012-11-30

    Abstract: The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode.

    Abstract translation: 所描述的实施例包括具有包括高速缓存控制器的多个存储体的高速缓存。 在这些实施例中,高速缓存控制器确定表示存储在高速缓存中的至少一个存储区中的非本机高速缓存块的值,其中当高速缓存块的归属位于相对于预定位置时,高速缓存块对于存储体是非本地的 去银行。 然后,高速缓存控制器基于代表存储在至少一个存储体中的非本地高速缓存块的值,确定高速缓存中的至少一个存储体将从第一功率模式转换到第二功率模式。 接下来,高速缓存控制器将所确定的高速缓存中的至少一个存储体从第一功率模式转换到第二功率模式。

    Enhanced atomics for workgroup synchronization

    公开(公告)号:US11288095B2

    公开(公告)日:2022-03-29

    申请号:US16588872

    申请日:2019-09-30

    Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.

    MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS
    44.
    发明申请
    MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS 审中-公开
    基于GPU的群集中的有效数据通信的消息聚合,组合和压缩

    公开(公告)号:US20160352598A1

    公开(公告)日:2016-12-01

    申请号:US15165953

    申请日:2016-05-26

    CPC classification number: H04L47/365

    Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

    Abstract translation: 一种高效数据并行计算网络流量管理高效管理的系统和方法。 处理节点包括能够生成网络消息的一个或多个处理器。 网络接口用于通过网络接收和发送网络消息。 处理节点将原始网络消息的数量或存储大小中的至少一个减少到一个或多个新的网络消息中。 新的网络消息被发送到网络接口以在网络上发送。

    Write combining cache microarchitecture for synchronization events
    45.
    发明授权
    Write combining cache microarchitecture for synchronization events 有权
    为同步事件写入组合缓存微架构

    公开(公告)号:US09477599B2

    公开(公告)日:2016-10-25

    申请号:US13961561

    申请日:2013-08-07

    CPC classification number: G06F12/0815 G06F12/0811 G06F12/128 Y02D10/13

    Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read/write combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events since a store event may not need to reach main memory to complete.

    Abstract translation: 描述了一种方法,计算机程序产品和系统,其强制与特殊访问顺序一致(RCsc)存储器模型的版本一致性,并且执行诸如StRel事件之类的释放同步指令,而不通过存储器层次来跟踪未完成的存储事件,同时有效地使用 带宽资源。 还描述了存储事件与存储事件的顺序相对于RCsc存储器模型的去耦。 该描述还包括一组层次读/写合并缓冲器,其将来自系统的不同部分的存储合并。 此外,池组件维护接收到的存储事件的部分顺序并释放同步事件,以避免内容可寻址存储器(CAM)结构,全缓存刷新以及对存储器的直接写入。 该方法提高了全局和本地同步事件的性能,因为存储事件可能不需要到达主内存才能完成。

    Hierarchical write-combining cache coherence
    46.
    发明授权
    Hierarchical write-combining cache coherence 有权
    分层写入组合高速缓存一致性

    公开(公告)号:US09396112B2

    公开(公告)日:2016-07-19

    申请号:US14010096

    申请日:2013-08-26

    CPC classification number: G06F12/0811 G06F12/0804 Y02D10/13

    Abstract: A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read-only cache and write-only combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events and reduces overhead in maintaining write-only combining buffers.

    Abstract translation: 描述了一种方法,计算机程序产品和系统,其强制与特殊访问顺序一致(RCsc)存储器模型的版本一致性,并且执行诸如StRel事件之类的释放同步指令,而不通过存储器层次来跟踪未完成的存储事件,同时有效地使用 带宽资源。 还描述了存储事件与存储事件的顺序相对于RCsc存储器模型的去耦。 该描述还包括一组分层只读缓存和只写组合缓冲器,其将来自系统的不同部分的存储合并。 此外,池组件维护接收到的存储事件的部分顺序并释放同步事件,以避免内容可寻址存储器(CAM)结构,全缓存刷新以及对存储器的直接写入。 该方法提高了全局和本地同步事件的性能,并减少了维持只写组合缓冲区的开销。

    PROCESSOR AND METHODS FOR REMOTE SCOPED SYNCHRONIZATION
    47.
    发明申请
    PROCESSOR AND METHODS FOR REMOTE SCOPED SYNCHRONIZATION 有权
    用于远程同步同步的处理器和方法

    公开(公告)号:US20160139624A1

    公开(公告)日:2016-05-19

    申请号:US14542042

    申请日:2014-11-14

    Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.

    Abstract translation: 这里描述的是一种用于远程作用域同步的装置和方法,它是一种新的语义,其允许工作项目使用其范围层级之外的范围实例来排序存储器访问。 更准确地说,远程同步将特定范围的可见性扩展到该范围包含的所有范围实例。 远程作用域同步操作允许更频繁地使用较小的范围,并且只有在需要较大的范围同步时才会降低增加的成本。 这使程序员可以优化执行存储器操作的范围,以便重要的通信模式,如工作窃取。 以最佳范围执行内存操作可以减少执行时间和能量。 特别地,远程同步允许工作项目与否则将无法访问的范围进行通信。 具体来说,工作项可以从不(分级)包含它们的范围提取有效的数据并将更新推送到范围。

    Conditional notification mechanism
    48.
    发明授权
    Conditional notification mechanism 有权
    条件通知机制

    公开(公告)号:US09256535B2

    公开(公告)日:2016-02-09

    申请号:US13856728

    申请日:2013-04-04

    Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.

    Abstract translation: 所描述的实施例包括具有第一处理器核心和第二处理器核心的计算设备。 在一些实施例中,在操作期间,第一处理器核心从第二处理器核心接收存储器位置和标志的指示。 第一处理器核心然后将标志存储在第一处理器核心中的高速缓存中的第一高速缓存行中,并将存储器位置的指示分别存储在高速缓存中的第二高速缓存行中。 当在评估所指示的存储器位置的条件时遇到预定结果时,第一处理器核心更新第一高速缓存行中的标志。 基于标志的更新,第一处理器核心使得第二处理器核心执行操作。

    Wavefront Resource Virtualization
    49.
    发明申请
    Wavefront Resource Virtualization 审中-公开
    波前资源虚拟化

    公开(公告)号:US20150363903A1

    公开(公告)日:2015-12-17

    申请号:US14304483

    申请日:2014-06-13

    CPC classification number: G06T1/20

    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

    Abstract translation: 一种处理器,包括硬件逻辑,其被配置为在硬件资源中执行第一波前,并且在所述第一波前完成之前停止所述第一波前的执行。 处理器调度第二个波阵面以在硬件资源中执行。

    MULTI-LEVEL MEMORY HIERARCHY
    50.
    发明申请
    MULTI-LEVEL MEMORY HIERARCHY 审中-公开
    多级记忆分级

    公开(公告)号:US20150293845A1

    公开(公告)日:2015-10-15

    申请号:US14250474

    申请日:2014-04-11

    CPC classification number: G06F12/0811 G06F12/1009 G06F2212/283 G06F2212/651

    Abstract: Described is a system and method for a multi-level memory hierarchy. Each level is based on different attributes including, for example, power, capacity, bandwidth, reliability, and volatility. In some embodiments, the different levels of the memory hierarchy may use an on-chip stacked dynamic random access memory, (providing fast, high-bandwidth, low-energy access to data) and an off-chip non-volatile random access memory, (providing low-power, high-capacity storage), in order to provide higher-capacity, lower power, and higher-bandwidth performance. The multi-level memory may present a unified interface to a processor so that specific memory hardware and software implementation details are hidden. The multi-level memory enables the illusion of a single-level memory that satisfies multiple conflicting constraints. A comparator receives a memory address from the processor, processes the address and reads from or writes to the appropriate memory level. In some embodiments, the memory architecture is visible to the software stack to optimize memory utilization.

    Abstract translation: 描述了用于多级存储器层次结构的系统和方法。 每个级别都是基于不同的属性,包括功率,容量,带宽,可靠性和波动性。 在一些实施例中,存储器层级的不同级别可以使用片上堆叠的动态随机存取存储器(提供对数据的快速,高带宽,低能量访问)和片外非易失性随机存取存储器, (提供低功耗,大容量存储),以提供更高容量,更低功耗和更高带宽的性能。 多级存储器可以向处理器呈现统一的接口,从而隐藏特定的存储器硬件和软件实现细节。 多级存储器能够实现满足多个冲突约束的单级存储器的错觉。 比较器从处理器接收存储器地址,处理地址并读取或写入适当的存储器级别。 在一些实施例中,存储器架构对于软件堆栈是可见的以优化存储器利用。

Patent Agency Ranking