EFFICIENTLY PERFORMING INFERENCE COMPUTATIONS OF A FULLY CONVOLUTIONAL NETWORK FOR INPUTS WITH DIFFERENT SIZES

    公开(公告)号:WO2023075742A1

    公开(公告)日:2023-05-04

    申请号:PCT/US2021/056418

    申请日:2021-10-25

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing inference computations of a fully convolutional neural network receiving inputs with different sizes. One of the methods include receiving a new input to be processed by a fully convolutional neural network, the new input having a first size different from a fixed size that the fully convolutional neural network is configured to process; determining, one or more fixed-size inputs from the new input, each fixed-size input having the fixed size; obtaining a respective fixed-size output generated by the fully convolutional neural network performing inference computations for each of the one or more fixed-size inputs; and generating, from the respective fixed-size outputs comprising one or more invalid pixel values, a final output that is equivalent to an output that would be generated by processing the new input using the fully convolutional neural network.

    PARAMETER CACHING FOR NEURAL NETWORK ACCELERATORS

    公开(公告)号:WO2021126194A1

    公开(公告)日:2021-06-24

    申请号:PCT/US2019/067289

    申请日:2019-12-18

    Applicant: GOOGLE LLC

    Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes obtaining data specifying one or more neural networks to be deployed on a neural network hardware accelerator, each of the one or more neural networks having a respective set of parameters, and the neural network hardware accelerator having one or more memories having a memory capacity; determining a maximum amount of the memory capacity that will be in use at any one time during a processing of any of the one or more neural networks by the neural network hardware accelerator; identifying a subset of the parameters of the one or more neural networks that consumes an amount of memory that is less than a difference between the memory capacity and the determined maximum amount of the memory capacity; and storing the identified subset of the parameters.

Patent Agency Ranking