REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM
    11.
    发明申请
    REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM 审中-公开
    减少异构计算系统中的冷TLB缺陷

    公开(公告)号:US20140101405A1

    公开(公告)日:2014-04-10

    申请号:US13645685

    申请日:2012-10-05

    Abstract: Methods and apparatuses are provided for avoiding cold translation lookaside buffer (TLB) misses in a computer system. A typical system is configured as a heterogeneous computing system having at least one central processing unit (CPU) and one or more graphic processing units (GPUs) that share a common memory address space. Each processing unit (CPU and GPU) has an independent TLB. When offloading a task from a particular CPU to a particular GPU, translation information is sent along with the task assignment. The translation information allows the GPU to load the address translation data into the TLB associated with the one or more GPUs prior to executing the task. Preloading the TLB of the GPUs reduces or avoids cold TLB misses that could otherwise occur without the benefits offered by the present disclosure.

    Abstract translation: 提供了用于避免计算机系统中冷翻译后备缓冲器(TLB)未命中的方法和装置。 典型的系统被配置为具有至少一个中央处理单元(CPU)和共享公共存储器地址空间的一个或多个图形处理单元(GPU)的异构计算系统。 每个处理单元(CPU和GPU)都有独立的TLB。 当将任务从特定CPU卸载到特定GPU时,将随任务分配一起发送翻译信息。 翻译信息允许GPU在执行任务之前将地址转换数据加载到与一个或多个GPU相关联的TLB中。 GPU的预加载减少或避免了在没有本公开提供的优点的情况下可能发生的冷TLB未命中。

    NETWORK INTERFACE CONTROLLER-BASED SCHEDULING OF PROCESSING TASKS IN A DISTRIBUTED COMPUTING SYSTEM

    公开(公告)号:US20180081715A1

    公开(公告)日:2018-03-22

    申请号:US15267936

    申请日:2016-09-16

    CPC classification number: G06F9/505

    Abstract: Techniques for scheduling processing tasks in a device having multiple computing elements are disclosed. A network interface controller of the device receives processing tasks, for execution on the computing elements, from a network that is external to the device. The network interface controller schedules the tasks for execution on the computing devices based on policy data available to the network interface controller. A scheduler within the network interface controller, which can be implemented as a standalone processing unit (such as a microcontroller, a programmable processing core, or an application specific integrated circuit), performs such scheduling, thereby freeing the central processing unit of the device from the burden of performing scheduling operations. The scheduler schedules the tasks according to any technically feasible scheduling technique.

    CONDITIONAL ATOMIC OPERATIONS AT A PROCESSOR
    14.
    发明申请
    CONDITIONAL ATOMIC OPERATIONS AT A PROCESSOR 审中-公开
    处理者的条件原子操作

    公开(公告)号:US20160357551A1

    公开(公告)日:2016-12-08

    申请号:US14728643

    申请日:2015-06-02

    Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

    Abstract translation: 条件获取和操作操作测试存储器位置以确定存储器位置是否存储指定的值,如果是,则修改存储器位置处的值。 可以实现条件获取和操作操作,使得其可以由多个并发执行的线程(诸如GPU处的波阵面的线程)同时执行。 为了执行条件提取和操作操作,选择并发执行的线程之一,以在存储器位置执行比较和交换(CAS)操作,而其他线程等待结果。 CAS操作测试内存位置的值,如果CAS操作成功,则将该值传递给每个并发执行的线程。

    METHOD FOR MEMORY CONSISTENCY AMONG HETEROGENEOUS COMPUTER COMPONENTS
    15.
    发明申请
    METHOD FOR MEMORY CONSISTENCY AMONG HETEROGENEOUS COMPUTER COMPONENTS 有权
    在异构计算机组件中存储器一致的方法

    公开(公告)号:US20140337587A1

    公开(公告)日:2014-11-13

    申请号:US14275271

    申请日:2014-05-12

    Abstract: A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model . For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.

    Abstract translation: 描述了一种方法,计算机程序产品和系统,其确定在具有异构计算机组件的计算设备中使用存储器操作的正确性。 实施例包括基于用于异构无竞争(SC for HRF)的顺序一致性的特性的优化器,该模型分析程序并确定程序中的事件的顺序的正确性。 HRF模型包括属性的组合:范围顺序,范围包含和范围传递性。 优化器可以根据HR对HRF内存一致性模型的SC来确定程序何时是异构无竞争的。 例如,优化器可以分析程序代码的一部分,尊重SC的HRF模型的属性,并且确定由存储器存储器事件产生的值是否将是由加载存储器事件观察到的值的候选。 此外,优化器可以确定是否可能重新排序事件。

    METHOD AND APPARATUS FOR TIME-BASED SCHEDULING OF TASKS

    公开(公告)号:US20170161114A1

    公开(公告)日:2017-06-08

    申请号:US14962784

    申请日:2015-12-08

    CPC classification number: G06F9/4881 G06F2209/483

    Abstract: A computing device is disclosed. The computing device includes an Accelerated Processing Unit (APU) including at least a first Heterogeneous System Architecture (HSA) computing device and at least a second HSA computing device, the second computing device being a different type than the first computing device, and an HSA Memory Management Unit (HMMU) allowing the APU to communicate with at least one memory. The computing task is enqueued on an HSA-managed queue that is set to run on the at least first HSA computing device or the at least second HSA computing device. The computing task is re-enqueued on the HSA-managed queue based on a repetition flag that triggers the number of times the computing task is re-enqueued. The repetition field is decremented each time the computing task is re-enqueued. The repetition field may include a special value (e.g., −1) to allow re-enqueuing of the computing task indefinitely.

    MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS
    17.
    发明申请
    MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS 审中-公开
    基于GPU的群集中的有效数据通信的消息聚合,组合和压缩

    公开(公告)号:US20160352598A1

    公开(公告)日:2016-12-01

    申请号:US15165953

    申请日:2016-05-26

    CPC classification number: H04L47/365

    Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

    Abstract translation: 一种高效数据并行计算网络流量管理高效管理的系统和方法。 处理节点包括能够生成网络消息的一个或多个处理器。 网络接口用于通过网络接收和发送网络消息。 处理节点将原始网络消息的数量或存储大小中的至少一个减少到一个或多个新的网络消息中。 新的网络消息被发送到网络接口以在网络上发送。

    Conditional notification mechanism
    18.
    发明授权
    Conditional notification mechanism 有权
    条件通知机制

    公开(公告)号:US09256535B2

    公开(公告)日:2016-02-09

    申请号:US13856728

    申请日:2013-04-04

    Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.

    Abstract translation: 所描述的实施例包括具有第一处理器核心和第二处理器核心的计算设备。 在一些实施例中,在操作期间,第一处理器核心从第二处理器核心接收存储器位置和标志的指示。 第一处理器核心然后将标志存储在第一处理器核心中的高速缓存中的第一高速缓存行中,并将存储器位置的指示分别存储在高速缓存中的第二高速缓存行中。 当在评估所指示的存储器位置的条件时遇到预定结果时,第一处理器核心更新第一高速缓存行中的标志。 基于标志的更新,第一处理器核心使得第二处理器核心执行操作。

    Wavefront Resource Virtualization
    19.
    发明申请
    Wavefront Resource Virtualization 审中-公开
    波前资源虚拟化

    公开(公告)号:US20150363903A1

    公开(公告)日:2015-12-17

    申请号:US14304483

    申请日:2014-06-13

    CPC classification number: G06T1/20

    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

    Abstract translation: 一种处理器,包括硬件逻辑,其被配置为在硬件资源中执行第一波前,并且在所述第一波前完成之前停止所述第一波前的执行。 处理器调度第二个波阵面以在硬件资源中执行。

    Conditional Notification Mechanism
    20.
    发明申请
    Conditional Notification Mechanism 有权
    条件通知机制

    公开(公告)号:US20140304474A1

    公开(公告)日:2014-10-09

    申请号:US13856728

    申请日:2013-04-04

    Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.

    Abstract translation: 所描述的实施例包括具有第一处理器核心和第二处理器核心的计算设备。 在一些实施例中,在操作期间,第一处理器核心从第二处理器核心接收存储器位置和标志的指示。 第一处理器核心然后将标志存储在第一处理器核心中的高速缓存中的第一高速缓存行中,并将存储器位置的指示分别存储在高速缓存中的第二高速缓存行中。 当在评估所指示的存储器位置的条件时遇到预定结果时,第一处理器核心更新第一高速缓存行中的标志。 基于标志的更新,第一处理器核心使得第二处理器核心执行操作。

Patent Agency Ranking