专利检索 ap:("Patryk Kaminski" OR "Thomas Woller" OR "Keith Lowery" OR "Erich Boleyn") AND inv:"Keith Lowery" 第 1 页

1.

发明授权
Systems and methods implementing non-shared page tables for sharing memory resources managed by a main operating system with accelerator devices 有权
标题翻译：实现非共享页表的系统和方法，用于共享由主操作系统使用加速器设备管理的内存资源

公开(公告)号：US08719543B2

公开(公告)日：2014-05-06

申请号：US12648550

申请日：2009-12-29

申请人： Patryk Kaminski , Thomas Woller , Keith Lowery , Erich Boleyn

发明人： Patryk Kaminski , Thomas Woller , Keith Lowery , Erich Boleyn

IPC分类号： G06F12/00 , G06F9/26 , G06F9/34

CPC分类号： G06F12/1027 , G06F9/5016 , G06F12/1081 , G06F2212/302 , G06F2212/682 , G06F2212/683

摘要： Systems and methods are provided that utilize non-shared page tables to allow an accelerator device to share physical memory of a computer system that is managed by and operates under control of an operating system. The computer system can include a multi-core central processor unit. The accelerator device can be, for example, an isolated core processor device of the multi-core central processor unit that is sequestered for use independently of the operating system, or an external device that is communicatively coupled to the computer system.

摘要翻译： 提供了利用非共享页面表来允许加速器设备共享由操作系统控制并由其操作的计算机系统的物理存储器的系统和方法。计算机系统可以包括多核心中央处理器单元。加速器装置可以是例如被独立于操作系统隔离使用的多核心中央处理器单元的隔离核心处理器装置，或者可通信地耦合到计算机系统的外部装置。

2.

发明授权
Automatic kernel migration for heterogeneous cores 有权
标题翻译：异构核心的自动内核迁移

公开(公告)号：US08683468B2

公开(公告)日：2014-03-25

申请号：US13108438

申请日：2011-05-16

申请人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff , Dz-Ching Ju

发明人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff , Dz-Ching Ju

IPC分类号： G06F9/46

CPC分类号： G06F9/4856 , G06F9/5066

摘要： A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

摘要翻译： 一种用于在多个异构核心之间自动迁移工作单元执行的系统和方法。计算系统包括具有单指令多数据微架构的第一处理器核心和具有通用微架构的第二处理器核心。编译器预测程序中的函数调用的执行在给定位置迁移到不同的处理器核心。编译器创建一个数据结构，以支持在给定位置移动与执行函数调用相关联的实时值。操作系统（OS）调度器将程序顺序之前的给定位置之前的至少代码调度到第一处理器核心。响应于接收到满足迁移条件的指示，OS调度器将活动值移动到由数据结构指示的位置，以供第二处理器核心访问，并且将给定位置之后的代码调度到第二处理器核心。

3.

发明申请
BRANCH REMOVAL BY DATA SHUFFLING 审中-公开
标题翻译：分支由数据取出拆卸

公开(公告)号：US20120331278A1

公开(公告)日：2012-12-27

申请号：US13167517

申请日：2011-06-23

申请人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery

发明人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery

IPC分类号： G06F9/38

CPC分类号： G06F9/5027 , G06F8/451 , G06F9/5044

摘要： A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.

摘要翻译： 一种用于通过减少多个分支指令来自动优化处理器中的多个工作单元的并行执行的系统和方法。计算系统包括具有通用微架构的第一处理器核和具有相同指令多数据（SIMD）微架构的第二处理器核。编译器使用用于确定一个或多个结果的一个或多个数据记录来检测和评估函数调用中的分支。生成多个计算子内核，每个子内核包含来自与分支的唯一结果相对应的函数的代码。通过将与分支的给定结果相对应的数据的一个或多个记录分配给与给定结果相关联的多个计算子核之一来生成多个工作单元。分支被删除。操作系统调度器将一个或多个计算子内核中的每一个调度到第一处理器核或第二处理器核。

4.

发明授权
System and method for NUMA-aware heap memory management 有权
标题翻译：用于NUMA感知堆内存管理的系统和方法

公开(公告)号：US08245008B2

公开(公告)日：2012-08-14

申请号：US12372839

申请日：2009-02-18

申请人： Patryk Kaminski , Keith Lowery

发明人： Patryk Kaminski , Keith Lowery

IPC分类号： G06F13/14

CPC分类号： G06F12/023 , G06F12/0284 , G06F2212/2542

摘要： A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.

摘要翻译： 公开了一种使用NUMA感知内存堆管理器在非统一存储器访问（NUMA）计算机系统上向多线程程序分配存储器的系统和方法。在实施例中，NUMA感知内存堆管理器可以尝试通过分配与请求存储器分配的线程相邻或在同一节点上的存储器块来最大化在NUMA系统中的存储器分配的位置。堆管理器可以跟踪每个存储器块的位置并且通过确定至少部分地依赖于其与请求线程的位置相关联的分配节点来满足分配请求。如果可能，堆管理器可能会尝试在与请求线程相同的节点上分配内存。堆管理器可以是非应用特定的，可以采用多级的空闲块缓存，和/或可以使用将给定存储器块与每个NUMA节点相关联的各种列表。

5.

发明申请
AUTOMATIC LOAD BALANCING FOR HETEROGENEOUS CORES 有权
标题翻译：自动负载平衡异常角

公开(公告)号：US20120291040A1

公开(公告)日：2012-11-15

申请号：US13105250

申请日：2011-05-11

申请人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff

发明人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff

IPC分类号： G06F9/46

CPC分类号： G06F9/5083

摘要： A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

摘要翻译： 一种用于在多个异构处理器内核之间高效自动调度工作单元执行的系统和方法。处理节点包括具有通用微架构的第一处理器核心和具有单个指令多数据微架构的第二处理器核心。计算机程序包括一个或多个计算内核或函数调用。编译器计算给定函数调用的运行前信息。运行时调度器通过将一个或多个内核中的每一个与相关联的数据记录进行匹配来生成一个或多个工作单元。至少部分地基于所计算的运行前信息，调度器将工作单元分配给第一或第二处理器核。此外，调度器能够基于与等待工作单元相同的内核的其他工作单元的动态运行时行为来改变等待工作单元的原始分配。

6.

发明授权
Automatic load balancing for heterogeneous cores 有权
标题翻译：异构核心的自动负载平衡

公开(公告)号：US08782645B2

公开(公告)日：2014-07-15

申请号：US13105250

申请日：2011-05-11

申请人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff

发明人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff

IPC分类号： G06F9/46 , G06F9/50

CPC分类号： G06F9/5083

摘要： A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.

摘要翻译： 一种用于在多个异构处理器内核之间高效自动调度工作单元执行的系统和方法。处理节点包括具有通用微架构的第一处理器核心和具有单个指令多数据微架构的第二处理器核心。计算机程序包括一个或多个计算内核或函数调用。编译器计算给定函数调用的运行前信息。运行时调度器通过将一个或多个内核中的每一个与相关联的数据记录进行匹配来生成一个或多个工作单元。至少部分地基于所计算的运行前信息，调度器将工作单元分配给第一或第二处理器核。此外，调度器能够基于与等待工作单元相同的内核的其他工作单元的动态运行时行为来改变等待工作单元的原始分配。

7.

发明申请
AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES 有权
标题翻译：自动KERNEL移动异构牙

公开(公告)号：US20120297163A1

公开(公告)日：2012-11-22

申请号：US13108438

申请日：2011-05-16

申请人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff , Dz-Ching Ju

发明人： Mauricio Breternitz , Patryk Kaminski , Keith Lowery , Anton Chernoff , Dz-Ching Ju

IPC分类号： G06F9/315 , G06F15/80

CPC分类号： G06F9/4856 , G06F9/5066

摘要： A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

摘要翻译： 一种用于在多个异构核心之间自动迁移工作单元执行的系统和方法。计算系统包括具有单指令多数据微架构的第一处理器核心和具有通用微架构的第二处理器核心。编译器预测程序中的函数调用的执行在给定位置迁移到不同的处理器核心。编译器创建一个数据结构，以支持在给定位置移动与执行函数调用相关联的实时值。操作系统（OS）调度器将程序顺序之前的给定位置之前的至少代码调度到第一处理器核心。响应于接收到满足迁移条件的指示，OS调度器将活动值移动到由数据结构指示的位置，以供第二处理器核心访问，并且将给定位置之后的代码调度到第二处理器核心。

8.

发明申请
System and Method for NUMA-Aware Heap Memory Management 有权
标题翻译： NUMA感知堆内存管理的系统和方法

公开(公告)号：US20100211756A1

公开(公告)日：2010-08-19

申请号：US12372839

申请日：2009-02-18

申请人： Patryk Kaminski , Keith Lowery

发明人： Patryk Kaminski , Keith Lowery

IPC分类号： G06F12/00

CPC分类号： G06F12/023 , G06F12/0284 , G06F2212/2542

摘要： A system and method for allocating memory to multi-threaded programs on a Non-Uniform Memory Access (NUMA) computer system using a NUMA-aware memory heap manager is disclosed. In embodiments, a NUMA-aware memory heap manager may attempt to maximize the locality of memory allocations in a NUMA system by allocating memory blocks that are near, or on the same node, as the thread that requested the memory allocation. A heap manager may keep track of each memory block's location and satisfy allocation requests by determining an allocation node dependent, at least in part, on its locality to that of the requesting thread. When possible, a heap manger may attempt to allocate memory on the same node as the requesting thread. The heap manager may be non-application-specific, may employ multiple levels of free block caching, and/or may employ various listings that associate given memory blocks with each NUMA node.

摘要翻译： 公开了一种使用NUMA感知内存堆管理器在非统一存储器访问（NUMA）计算机系统上向多线程程序分配存储器的系统和方法。在实施例中，NUMA感知内存堆管理器可以尝试通过分配与请求存储器分配的线程相邻或在同一节点上的存储器块来最大化在NUMA系统中的存储器分配的位置。堆管理器可以跟踪每个存储器块的位置并且通过确定至少部分地依赖于其与请求线程的位置相关联的分配节点来满足分配请求。如果可能，堆管理器可能会尝试在与请求线程相同的节点上分配内存。堆管理器可以是非应用特定的，可以采用多级的空闲块缓存，和/或可以使用将给定存储器块与每个NUMA节点相关联的各种列表。

9.

发明授权
Optimizing communication of system call requests 有权
标题翻译：优化系统通话请求

公开(公告)号：US08752064B2

公开(公告)日：2014-06-10

申请号：US13307505

申请日：2011-11-30

申请人： Benjamin Thomas Sander , Michael Houston , Newton Cheung , Keith Lowery

发明人： Benjamin Thomas Sander , Michael Houston , Newton Cheung , Keith Lowery

IPC分类号： G06F13/00

CPC分类号： G06F9/544 , G06F9/522 , G06F9/546 , G06F2209/509 , G06T1/20

摘要： Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.

摘要翻译： 这里提供了一种用于优化用于系统呼叫的通信的方法。该方法包括将波形中的每个工作项的系统调用存储起来，并将所存储的系统调用发送到处理器进行执行。该方法还包括响应于所述发送，将波形中的每个工作项接收结果。

10.

发明授权
Heterogeneous enqueuing and dequeuing mechanism for task scheduling 有权
标题翻译：任务调度的异构入队和出队机制

公开(公告)号：US09430281B2

公开(公告)日：2016-08-30

申请号：US13292740

申请日：2011-11-09

申请人： Benjamin Thomas Sander , Michael Houston , Newton Cheung , Keith Lowery

发明人： Benjamin Thomas Sander , Michael Houston , Newton Cheung , Keith Lowery

IPC分类号： G06F9/46 , G06F9/48

CPC分类号： G06F9/4881

摘要： Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.

摘要翻译： 提供了用于在加速处理设备（APD）上进行任务调度的方法，系统和计算机可读介质。在一个实施例中，一种方法包括：基于APD将存储器存储模块中的一个或多个任务排入队列; 使用基于软件的入队模块; 以及使用基于硬件的命令处理器从所述存储器存储模块中引出所述一个或多个任务，其中所述命令处理器将所述一个或多个任务转发到所述着色器核心。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类