Convolutional Neural Network On Programmable Two Dimensional Image Processor

    公开(公告)号:US20180005074A1

    公开(公告)日:2018-01-04

    申请号:US15201204

    申请日:2016-07-01

    Applicant: Google Inc.

    Abstract: A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The two-dimensional shift register provides local respective register space for the execution lanes. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.

    Compiler Techniques for Mapping Program Code to a High Performance, Power Efficient, Programmable Image Processing Hardware Platform

    公开(公告)号:US20170249716A1

    公开(公告)日:2017-08-31

    申请号:US15389113

    申请日:2016-12-22

    Applicant: Google Inc.

    CPC classification number: G06T1/20 G06F8/447 G06F9/5077

    Abstract: A method is described. The method includes compiling program code targeted for an image processor having programmable stencil processors composed of respective two-dimensional execution lane and shift register circuit structures. The program code is to implement a directed acyclic graph and is composed of multiple kernels that are to execute on respective ones of the stencil processors, wherein the compiling includes any of: recognizing there are a different number of kernels in the program code than stencil processors in the image processor; recognizing that at least one of the kernels is more computationally intensive than another one of the kernels; and, recognizing that the program code has resource requirements that exceed the image processor's memory capacity. The compiling further includes in response to any of the recognizing above performing any of: horizontal fusion of kernels; vertical fusion of kernels; fission of one of the kernels into multiple kernels; spatial partitioning of a kernel into multiple spatially partitioned kernels; splitting the directed acyclic graph into smaller graphs.

Patent Agency Ranking