Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units
    1.
    发明授权
    Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units 有权
    在异构加速处理单元云上的高效嵌套映射减少执行

    公开(公告)号:US09152601B2

    公开(公告)日:2015-10-06

    申请号:US13890828

    申请日:2013-05-09

    Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.

    Abstract translation: 嵌入式地图缩减框架工作负载以利用中央处理单元(CPU)和图形处理单元(GPU)的组合执行以及加速处理单元(APU)中数据访问的较低延迟的方法和方法是 描述。 在实施例中,生成度量以确定在CPU或GPU上是否更有效地处理地图或缩小功能。 第一度量是基于分支指令的数目与多个非分支指令的比率,第二度量是基于CPU和GPU中的每一个的执行时间的比较。 基于第一和第二指标选择地图的执行和减少功能导致加速计算。 一些实施例包括调度CPU上的功能的流水线执行和GPU上的功能,以实现功率有效的嵌套映射减少框架执行。

    POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS
    2.
    发明申请
    POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS 有权
    在异质加速加工单元的云上实现功率有效的降低成本

    公开(公告)号:US20140333638A1

    公开(公告)日:2014-11-13

    申请号:US13890828

    申请日:2013-05-09

    Abstract: An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.

    Abstract translation: 嵌入式地图缩减框架工作负载以利用中央处理单元(CPU)和图形处理单元(GPU)的组合执行以及加速处理单元(APU)中数据访问的较低延迟的方法和方法是 描述。 在实施例中,生成度量以确定在CPU或GPU上是否更有效地处理地图或缩小功能。 第一度量是基于分支指令的数目与多个非分支指令的比率,第二度量是基于CPU和GPU中的每一个的执行时间的比较。 基于第一和第二指标选择地图的执行和减少功能导致加速计算。 一些实施例包括调度CPU上的功能的流水线执行和GPU上的功能,以实现功率有效的嵌套映射减少框架执行。

    GPU assisted garbage collection
    3.
    发明授权
    GPU assisted garbage collection 有权
    GPU辅助垃圾收集

    公开(公告)号:US08639730B2

    公开(公告)日:2014-01-28

    申请号:US13625362

    申请日:2012-09-24

    CPC classification number: G06F12/0269

    Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.

    Abstract translation: 一种有效的垃圾收集系统和方法。 通用中央处理单元(CPU)将垃圾收集请求和第一日志发送到特殊处理单元(SPU)。 第一个日志包括存储在与CPU对应的存储器的堆中的每个分配的数据对象的地址和数据大小。 SPU具有单指令多数据(SIMD)并行架构,并且可以是图形处理单元(GPU)。 SPU由于其架构存储在存储器中的数据对象的本地表示而有效地执行垃圾收集算法的操作。 SPU记录清除死区数据对象和压缩实时数据对象所执行的更改列表。 此列表随后发送到执行包含操作的CPU。

    GPU ASSISTED GARBAGE COLLECTION
    4.
    发明申请
    GPU ASSISTED GARBAGE COLLECTION 有权
    GPU辅助收集

    公开(公告)号:US20130036295A1

    公开(公告)日:2013-02-07

    申请号:US13625362

    申请日:2012-09-24

    CPC classification number: G06F12/0269

    Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory.The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.

    Abstract translation: 一种有效的垃圾收集系统和方法。 通用中央处理单元(CPU)将垃圾收集请求和第一日志发送到特殊处理单元(SPU)。 第一个日志包括存储在与CPU对应的存储器的堆中的每个分配的数据对象的地址和数据大小。 SPU具有单指令多数据(SIMD)并行架构,并且可以是图形处理单元(GPU)。 SPU由于其架构存储在存储器中的数据对象的本地表示而有效地执行垃圾收集算法的操作。 SPU记录清除死区数据对象和压缩实时数据对象所执行的更改列表。 此列表随后发送到执行包含操作的CPU。

Patent Agency Ranking