GRAPHICS PROCESSING UNIT OPERATION
    3.
    发明申请

    公开(公告)号:US20200302568A1

    公开(公告)日:2020-09-24

    申请号:US15779368

    申请日:2016-12-06

    Abstract: A system and method for distributed computing including a compute node having a graphics processing unit (GPU) to execute tasks of a distributed computing job. A distributed-computing programming framework executes the tasks on the compute node. A GPU-daemon process shares GPU resources between the tasks executing on the GPU of the compute node.

    Data stored or free space based FIFO buffer

    公开(公告)号:US12271319B2

    公开(公告)日:2025-04-08

    申请号:US17054762

    申请日:2018-09-27

    Abstract: Systems, methods, and computer-readable media are provided for variable precision first in, first out (FIFO) buffers (VPFB) that dynamically changes the amount of data to be stored in the VPFB based on a current amount of data stored in the VPFB and/or based on a current amount of available memory space of the VPFB. The currently unavailable memory space (or the current available memory space) is used to select the size of a next data block to be stored in the VPFB. Other embodiments are disclosed and/or claimed.

    METHOD AND APPARATUS TO IMPROVE SHARED MEMORY EFFICIENCY

    公开(公告)号:US20190042412A1

    公开(公告)日:2019-02-07

    申请号:US15757727

    申请日:2015-09-25

    Abstract: Methods and apparatus to improve shared memory efficiency are described. In an embodiment, a first version of a code to access one or more registers as shared local memory is compiled. A second version of the same code is also compiled to access a cache as the shared local memory. The first version of the code is executed in response to comparison of a work group size of the code with a threshold value. Other embodiments are also disclosed and claimed.

    METHODS AND APPARATUS TO ACCELERATE CONVOLUTION

    公开(公告)号:US20240265232A1

    公开(公告)日:2024-08-08

    申请号:US18565972

    申请日:2021-09-24

    CPC classification number: G06N3/04

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed. An example apparatus includes at least one memory, instructions in the apparatus, and processor circuitry to execute the instructions to detect a pattern of an upsampled input submatrix, generate a transformed input submatrix by selecting four elements of the upsampled input submatrix, select a transformed weight submatrix based on the pattern, and convolve the transformed input submatrix and the transformed weight submatrix.

Patent Agency Ranking