Automatic kernel migration for heterogeneous cores
    2.
    发明授权
    Automatic kernel migration for heterogeneous cores 有权
    异构核心的自动内核迁移

    公开(公告)号:US08683468B2

    公开(公告)日:2014-03-25

    申请号:US13108438

    申请日:2011-05-16

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4856 G06F9/5066

    摘要: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

    摘要翻译: 一种用于在多个异构核心之间自动迁移工作单元执行的系统和方法。 计算系统包括具有单指令多数据微架构的第一处理器核心和具有通用微架构的第二处理器核心。 编译器预测程序中的函数调用的执行在给定位置迁移到不同的处理器核心。 编译器创建一个数据结构,以支持在给定位置移动与执行函数调用相关联的实时值。 操作系统(OS)调度器将程序顺序之前的给定位置之前的至少代码调度到第一处理器核心。 响应于接收到满足迁移条件的指示,OS调度器将活动值移动到由数据结构指示的位置,以供第二处理器核心访问,并且将给定位置之后的代​​码调度到第二处理器核心。

    BRANCH REMOVAL BY DATA SHUFFLING
    3.
    发明申请
    BRANCH REMOVAL BY DATA SHUFFLING 审中-公开
    分支由数据取出拆卸

    公开(公告)号:US20120331278A1

    公开(公告)日:2012-12-27

    申请号:US13167517

    申请日:2011-06-23

    IPC分类号: G06F9/38

    摘要: A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.

    摘要翻译: 一种用于通过减少多个分支指令来自动优化处理器中的多个工作单元的并行执行的系统和方法。 计算系统包括具有通用微架构的第一处理器核和具有相同指令多数据(SIMD)微架构的第二处理器核。 编译器使用用于确定一个或多个结果的一个或多个数据记录来检测和评估函数调用中的分支。 生成多个计算子内核,每个子内核包含来自与分支的唯一结果相对应的函数的代码。 通过将与分支的给定结果相对应的数据的一个或多个记录分配给与给定结果相关联的多个计算子核之一来生成多个工作单元。 分支被删除。 操作系统调度器将一个或多个计算子内核中的每一个调度到第一处理器核或第二处理器核。

    System and method for NUMA-aware heap memory management
    4.
    发明授权
    System and method for NUMA-aware heap memory management 有权
    用于NUMA感知堆内存管理的系统和方法

    公开(公告)号:US08245008B2

    公开(公告)日:2012-08-14

    申请号:US12372839

    申请日:2009-02-18

    IPC分类号: G06F13/14

    摘要: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.

    摘要翻译: 公开了一种使用NUMA感知内存堆管理器在非统一存储器访问(NUMA)计算机系统上向多线程程序分配存储器的系统和方法。 在实施例中,NUMA感知内存堆管理器可以尝试通过分配与请求存储器分配的线程相邻或在同一节点上的存储器块来最大化在NUMA系统中的存储器分配的位置。 堆管理器可以跟踪每个存储器块的位置并且通过确定至少部分地依赖于其与请求线程的位置相关联的分配节点来满足分配请求。 如果可能,堆管理器可能会尝试在与请求线程相同的节点上分配内存。 堆管理器可以是非应用特定的,可以采用多级的空闲块缓存,和/或可以使用将给定存储器块与每个NUMA节点相关联的各种列表。

    AUTOMATIC LOAD BALANCING FOR HETEROGENEOUS CORES
    5.
    发明申请
    AUTOMATIC LOAD BALANCING FOR HETEROGENEOUS CORES 有权
    自动负载平衡异常角

    公开(公告)号:US20120291040A1

    公开(公告)日:2012-11-15

    申请号:US13105250

    申请日:2011-05-11

    IPC分类号: G06F9/46

    CPC分类号: G06F9/5083

    摘要: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

    摘要翻译: 一种用于在多个异构处理器内核之间高效自动调度工作单元执行的系统和方法。 处理节点包括具有通用微架构的第一处理器核心和具有单个指令多数据微架构的第二处理器核心。 计算机程序包括一个或多个计算内核或函数调用。 编译器计算给定函数调用的运行前信息。 运行时调度器通过将一个或多个内核中的每一个与相关联的数据记录进行匹配来生成一个或多个工作单元。 至少部分地基于所计算的运行前信息,调度器将工作单元分配给第一或第二处理器核。 此外,调度器能够基于与等待工作单元相同的内核的其他工作单元的动态运行时行为来改变等待工作单元的原始分配。

    Automatic load balancing for heterogeneous cores
    6.
    发明授权
    Automatic load balancing for heterogeneous cores 有权
    异构核心的自动负载平衡

    公开(公告)号:US08782645B2

    公开(公告)日:2014-07-15

    申请号:US13105250

    申请日:2011-05-11

    IPC分类号: G06F9/46 G06F9/50

    CPC分类号: G06F9/5083

    摘要: A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

    摘要翻译: 一种用于在多个异构处理器内核之间高效自动调度工作单元执行的系统和方法。 处理节点包括具有通用微架构的第一处理器核心和具有单个指令多数据微架构的第二处理器核心。 计算机程序包括一个或多个计算内核或函数调用。 编译器计算给定函数调用的运行前信息。 运行时调度器通过将一个或多个内核中的每一个与相关联的数据记录进行匹配来生成一个或多个工作单元。 至少部分地基于所计算的运行前信息,调度器将工作单元分配给第一或第二处理器核。 此外,调度器能够基于与等待工作单元相同的内核的其他工作单元的动态运行时行为来改变等待工作单元的原始分配。

    AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES
    7.
    发明申请
    AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES 有权
    自动KERNEL移动异构牙

    公开(公告)号:US20120297163A1

    公开(公告)日:2012-11-22

    申请号:US13108438

    申请日:2011-05-16

    IPC分类号: G06F9/315 G06F15/80

    CPC分类号: G06F9/4856 G06F9/5066

    摘要: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

    摘要翻译: 一种用于在多个异构核心之间自动迁移工作单元执行的系统和方法。 计算系统包括具有单指令多数据微架构的第一处理器核心和具有通用微架构的第二处理器核心。 编译器预测程序中的函数调用的执行在给定位置迁移到不同的处理器核心。 编译器创建一个数据结构,以支持在给定位置移动与执行函数调用相关联的实时值。 操作系统(OS)调度器将程序顺序之前的给定位置之前的至少代码调度到第一处理器核心。 响应于接收到满足迁移条件的指示,OS调度器将活动值移动到由数据结构指示的位置,以供第二处理器核心访问,并且将给定位置之后的代​​码调度到第二处理器核心。

    System and Method for NUMA-Aware Heap Memory Management
    8.
    发明申请
    System and Method for NUMA-Aware Heap Memory Management 有权
    NUMA感知堆内存管理的系统和方法

    公开(公告)号:US20100211756A1

    公开(公告)日:2010-08-19

    申请号:US12372839

    申请日:2009-02-18

    IPC分类号: G06F12/00

    摘要: A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.

    摘要翻译: 公开了一种使用NUMA感知内存堆管理器在非统一存储器访问(NUMA)计算机系统上向多线程程序分配存储器的系统和方法。 在实施例中,NUMA感知内存堆管理器可以尝试通过分配与请求存储器分配的线程相邻或在同一节点上的存储器块来最大化在NUMA系统中的存储器分配的位置。 堆管理器可以跟踪每个存储器块的位置并且通过确定至少部分地依赖于其与请求线程的位置相关联的分配节点来满足分配请求。 如果可能,堆管理器可能会尝试在与请求线程相同的节点上分配内存。 堆管理器可以是非应用特定的,可以采用多级的空闲块缓存,和/或可以使用将给定存储器块与每个NUMA节点相关联的各种列表。

    Heterogeneous enqueuing and dequeuing mechanism for task scheduling
    10.
    发明授权
    Heterogeneous enqueuing and dequeuing mechanism for task scheduling 有权
    任务调度的异构入队和出队机制

    公开(公告)号:US09430281B2

    公开(公告)日:2016-08-30

    申请号:US13292740

    申请日:2011-11-09

    IPC分类号: G06F9/46 G06F9/48

    CPC分类号: G06F9/4881

    摘要: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.

    摘要翻译: 提供了用于在加速处理设备(APD)上进行任务调度的方法,系统和计算机可读介质。 在一个实施例中,一种方法包括:基于APD将存储器存储模块中的一个或多个任务排入队列; 使用基于软件的入队模块; 以及使用基于硬件的命令处理器从所述存储器存储模块中引出所述一个或多个任务,其中所述命令处理器将所述一个或多个任务转发到所述着色器核心。