MULTI-GPU DEEP LEARNING USING CPUS
    1.
    发明申请

    公开(公告)号:US20190188560A1

    公开(公告)日:2019-06-20

    申请号:US15843244

    申请日:2017-12-15

    IPC分类号: G06N3/08 G06N3/04 G06T1/20

    CPC分类号: G06N3/08 G06N3/04 G06T1/20

    摘要: A computer-implemented method, computer program product, and computer processing system are provided for accelerating neural network data parallel training in multiple graphics processing units (GPUs) using at least one central processing unit (CPU). The method includes forming a set of chunks. Each of the chunks includes a respective group of neural network layers other than a last layer. The method further includes performing one or more chunk-wise synchronization operations during a backward phase of the neural network data parallel training, by each of the multiple GPUs and the at least one CPU.

    Methodology for fast detection of false sharing in threaded scientific codes
    4.
    发明授权
    Methodology for fast detection of false sharing in threaded scientific codes 有权
    用于快速检测线程科学代码中的虚假共享的方法

    公开(公告)号:US08898648B2

    公开(公告)日:2014-11-25

    申请号:US13689927

    申请日:2012-11-30

    CPC分类号: G06F11/3624

    摘要: A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.

    摘要翻译: 分析工具识别具有虚假共享潜力的代码区域。 静态分析工具将识别的代码区域中的变量和数组进行分类。 映射检测库将所识别的代码区域中的存储器访问指令与所识别的代码区域中的变量和数组相关联,同时处理器正在运行所识别的代码区域。 映射检测库识别在识别的代码区域中有风险的一个或多个指令,这些指令受到虚假共享检测库的分析。 虚假共享检测库在处理器重新运行所识别的代码区域时对处于风险中的一个或多个指令执行运行时分析。 假共享检测库基于执行的运行时分析来确定高速缓冲存储器行的两个不同部分是否被生成的二进制代码访问。

    ReLU compression to reduce GPU memory

    公开(公告)号:US11362670B2

    公开(公告)日:2022-06-14

    申请号:US17085196

    申请日:2020-10-30

    摘要: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

    FILE SYSTEM FOR GENOMIC DATA
    8.
    发明申请
    FILE SYSTEM FOR GENOMIC DATA 审中-公开
    用于基因数据的文件系统

    公开(公告)号:US20170060896A1

    公开(公告)日:2017-03-02

    申请号:US14833960

    申请日:2015-08-24

    IPC分类号: G06F17/30

    CPC分类号: G06F16/1744 G06F16/1794

    摘要: Methods and systems for managing data redundancy include registering certified commands, input files, output files, and arguments in an execution history list after execution of said certified commands. An existing output file is provided in response to execution of a first certified command that matches an entry in the execution history list. A file is deleted if the file is reproducible from another file using a second certified command. The deleted file is registered in a reproducible file list. The deleted file is reproduced upon request using the second certified command.

    摘要翻译: 用于管理数据冗余的方法和系统包括在执行所述经认证的命令之后,在执行历史列表中注册已认证的命令,输入文件,输出文件和参数。 响应于与执行历史列表中的条目匹配的第一认证命令的执行而提供现有的输出文件。 如果文件使用第二个认证的命令从另一个文件重现,则文件被删除。 删除的文件被注册在可重现的文件列表中。 删除的文件根据请求使用第二个认证命令进行复制。

    ReLU COMPRESSION TO REDUCE GPU MEMORY

    公开(公告)号:US20220140841A1

    公开(公告)日:2022-05-05

    申请号:US17085196

    申请日:2020-10-30

    摘要: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.