SYSTEM AND METHOD FOR COMPILER SUPPORT FOR COMPILE TIME CUSTOMIZATION OF CODE

    公开(公告)号:US20190196797A1

    公开(公告)日:2019-06-27

    申请号:US16287392

    申请日:2019-02-27

    CPC classification number: G06F8/41 G06F8/51

    Abstract: A system and method for processing source code for compilation. The method includes accessing a portion of host source code and determining whether the portion of the host source code comprises a device lambda expression. The method further includes in response to the portion of host code comprising the device lambda expression, determining a unique placeholder type instantiation based on the device lambda expression and modifying the device lambda expression based on the unique placeholder type instantiation to produce modified host source code. The method further includes sending the modified host source code to a host compiler.

    Confluence analysis and loop fast-forwarding for improving SIMD execution efficiency

    公开(公告)号:US09612811B2

    公开(公告)日:2017-04-04

    申请号:US14160426

    申请日:2014-01-21

    CPC classification number: G06F8/456 G06F8/443 G06F8/4452

    Abstract: One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.

    Methods for reducing memory space in sequential operations using directed acyclic graphs
    3.
    发明授权
    Methods for reducing memory space in sequential operations using directed acyclic graphs 有权
    使用有向非循环图在顺序操作中减少内存空间的方法

    公开(公告)号:US09563933B2

    公开(公告)日:2017-02-07

    申请号:US14165789

    申请日:2014-01-28

    CPC classification number: G06T1/60 G06F8/34 G06F8/4434

    Abstract: Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.

    Abstract translation: 各种公开的实施例涉及用于在顺序的计算机实现的操作中减少存储空间的方法和系统。 该方法包括生成具有多个顶点和定向边缘的有向非循环图(DAG),其中每个边缘将前导顶点连接到后继顶点。 每个顶点表示计算机实现的操作之一,每个有向边代表由操作产生的输出数据。 该方法包括通过将前导顶点和后继顶点的操作通过组合前导顶点和后继顶点的操作通过有向边连接并且仅有一个源自前导的有向边的方式将前导顶点之一与后继顶点之一合并 顶点。 前导和后继顶点的合并减少了DAG中的有向边的数量,导致存储输出数据所需的中间缓冲存储器的减少。

    Technique for live analysis-based rematerialization to reduce register pressures and enhance parallelism
    4.
    发明授权
    Technique for live analysis-based rematerialization to reduce register pressures and enhance parallelism 有权
    基于实时分析的重建技术,以减少注册压力并增强并行度

    公开(公告)号:US09436447B2

    公开(公告)日:2016-09-06

    申请号:US13669401

    申请日:2012-11-05

    Abstract: A device compiler and linker within a parallel processing unit (PPU) is configured to optimize program code of a co-processor enabled application by rematerializing a subset of live-in variables for a particular block in a control flow graph generated for that program code. The device compiler and linker identifies the block of the control flow graph that has the greatest number of live-in variables, then selects a subset of the live-in variables associated with the identified block for which rematerializing confers the greatest estimated profitability. The profitability of rematerializing a given subset of live-in variables is determined based on the number of live-in variables reduced, the cost of rematerialization, and the potential risk of rematerialization.

    Abstract translation: 配置并行处理单元(PPU)中的设备编译器和链接器被配置为通过为为该程序代码生成的控制流程图中的特定块重新实现一个特定块的实时变量子集,来优化协处理器使能应用程序的程序代码。 设备编译器和链接器标识具有最多入住变量数量的控制流程图的块,然后选择与识别的块相关联的实时变量的子集,其中重新赋值赋予最大的估计盈利能力。 基于实时变量减少的数量,重新实现的成本以及再次实现的潜在风险,确定给定子项的实时变量的重新实现的盈利能力。

    SYSTEM AND METHOD FOR INSERTING SYNCHRONIZATION STATEMENTS INTO A PROGRAM FILE TO MITIGATE RACE CONDITIONS
    5.
    发明申请
    SYSTEM AND METHOD FOR INSERTING SYNCHRONIZATION STATEMENTS INTO A PROGRAM FILE TO MITIGATE RACE CONDITIONS 审中-公开
    将同步语句插入程序文件以减轻条件的系统和方法

    公开(公告)号:US20140143755A1

    公开(公告)日:2014-05-22

    申请号:US13681554

    申请日:2012-11-20

    CPC classification number: G06F8/458

    Abstract: A system and method are provided for inserting synchronization statements into a program file to mitigate race conditions. The method includes reading a program file and determining one or more convergent statements in the program file. The method also includes inserting one or more synchronization statements in the program file between the determined convergent statements. The method further includes removing one or more of the inserted synchronization statements and writing the modified program file. The method may include, after removing the inserted synchronization statements, identifying to a user any remaining inserted synchronization statements.

    Abstract translation: 提供了一种将同步语句插入到程序文件中以减轻竞争条件的系统和方法。 该方法包括读取程序文件并确定程序文件中的一个或多个收敛语句。 该方法还包括在确定的收敛语句之间的程序文件中插入一个或多个同步语句。 该方法还包括移除插入的同步语句中的一个或多个并写入修改的程序文件。 该方法可以在去除插入的同步语句之后,向用户标识任何剩余的插入的同步语句。

    SYSTEM AND METHOD FOR COMPILER SUPPORT FOR KERNEL LAUNCHES IN DEVICE CODE
    6.
    发明申请
    SYSTEM AND METHOD FOR COMPILER SUPPORT FOR KERNEL LAUNCHES IN DEVICE CODE 有权
    用于编码器支持的系统和方法,用于KERNEL在设备代码中的发布

    公开(公告)号:US20130300752A1

    公开(公告)日:2013-11-14

    申请号:US13735981

    申请日:2013-01-07

    CPC classification number: G06F9/54 G06F8/41 G06F9/4843

    Abstract: A system and method for compiling source code (e.g., with a compiler). The method includes accessing a portion of device source code and determining whether the portion of the device source code comprises a piece of work to be launched on a device from the device. The method further includes determining a plurality of application programming interface (API) calls based on the piece of work to be launched on the device and generating compiled code based on the plurality of API calls. The compiled code comprises a first portion operable to execute on a central processing unit (CPU) and a second portion operable to execute on the device (e.g., GPU).

    Abstract translation: 用于编译源代码的系统和方法(例如,使用编译器)。 该方法包括访问设备源代码的一部分并且确定设备源代码的部分是否包括要在设备上从设备启动的工作。 该方法还包括基于要在设备上启动的工作并基于多个API调用生成编译代码来确定多个应用编程接口(API)调用。 编译代码包括可操作以在中央处理单元(CPU)上执行的第一部分和可操作以在设备(例如,GPU)上执行的第二部分。

    TECHNIQUES FOR INFERRING INFORMATION

    公开(公告)号:US20230123811A1

    公开(公告)日:2023-04-20

    申请号:US17503210

    申请日:2021-10-15

    Abstract: Apparatuses, systems, and techniques to infer information from one or more sets of data. In at least one embodiment, a processor uses one or more neural networks to infer information from one or more sets of data based, at least in part, on one or more dynamically configurable dimensions of the one or more sets of data.

    CODE COVERAGE GENERATION IN GPU BY USING HOST-DEVICE COORDINATION

    公开(公告)号:US20190108006A1

    公开(公告)日:2019-04-11

    申请号:US16154542

    申请日:2018-10-08

    Abstract: System and method of compiling a program having a mixture of host code and device code to enable code coverage data collection for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: coverage instrumentation counters for the device functions; mapping information that maps the counters with the instrumented source points; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected code coverage information from the device memory to the host memory. Execution of the instrumented executable can yield a coverage report on the device code functions.

Patent Agency Ranking