DEEP LEARNING THREAD COMMUNICATION
    9.
    发明申请

    公开(公告)号:US20200334076A1

    公开(公告)日:2020-10-22

    申请号:US16389548

    申请日:2019-04-19

    Abstract: An application binary interface (ABI) can be exposed in a processor to enable blocks of threads, which may correspond to separately compiled operators, to communicate without storing data to global memory external to the processor. The ABI can define how results of one computation, corresponding to a first thread block, will be organized in registers and shared memory of a processor at the end of one operator (i.e., kernel). The start of the next operator (i.e., kernel), corresponding to a second thread block, can consume the results from the registers and shared memory. Data can be stored to processor local storage for individual threads as they exit the block. Once published, libraries can be separately compiled, optimized, and tested as long as they adhere to the published ABI.

Patent Agency Ranking