-
公开(公告)号:US10719905B2
公开(公告)日:2020-07-21
申请号:US16547801
申请日:2019-08-22
Applicant: Google LLC
Inventor: Qiuling Zhu , Ofer Shacham , Albert Meixner , Jason Rupert Redgrave , Daniel Frederic Finchelstein , David Patterson , Neeti Desai , Donald Stark , Edward Chang , William R. Mark
Abstract: An apparatus is described. The apparatus includes an image processing unit. The image processing unit includes a plurality of stencil processor circuits each comprising an array of execution unit lanes coupled to a two-dimensional shift register array structure to simultaneously process multiple overlapping stencils through execution of program code. The image processing unit includes a plurality of sheet generators respectively coupled between the plurality of stencil processors and the network. The sheet generators are to parse input line groups of image data into input sheets of image data for processing by the stencil processors, and, to form output line groups of image data from output sheets of image data received from the stencil processors. The image processing unit includes a plurality of line buffer units coupled to the network to pass line groups in a direction from producing stencil processors to consuming stencil processors to implement an overall program flow.
-
公开(公告)号:US10417732B2
公开(公告)日:2019-09-17
申请号:US15599348
申请日:2017-05-18
Applicant: Google LLC
Inventor: Qiuling Zhu , Ofer Shacham , Albert Meixner , Jason Rupert Redgrave , Daniel Frederic Finchelstein , David Patterson , Neeti Desai , Donald Stark , Edward Chang , William Mark
Abstract: An apparatus is described. The apparatus includes an image processing unit. The image processing unit includes a plurality of stencil processor circuits each comprising an array of execution unit lanes coupled to a two-dimensional shift register array structure to simultaneously process multiple overlapping stencils through execution of program code. The image processing unit includes a plurality of sheet generators respectively coupled between the plurality of stencil processors and the network. The sheet generators are to parse input line groups of image data into input sheets of image data for processing by the stencil processors, and, to form output line groups of image data from output sheets of image data received from the stencil processors. The image processing unit includes a plurality of line buffer units coupled to the network to pass line groups in a direction from producing stencil processors to consuming stencil processors to implement an overall program flow.
-
公开(公告)号:US09978116B2
公开(公告)日:2018-05-22
申请号:US15598082
申请日:2017-05-17
Applicant: Google LLC
Inventor: Albert Meixner , Daniel Frederic Finchelstein , David Patterson , William Mark , Jason Rupert Redgrave , Ofer Shacham
CPC classification number: G06T1/20 , G06K9/00986 , G11C19/28
Abstract: A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, doubling a simultaneous shift amount of multiple rows or columns of the two dimensional shift register array with each next iteration. The method also includes executing one or more instructions within respective lanes of the two dimensional execution lane array in between shifts of iterations. Another method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly executing one or more instructions within respective lanes of the execution lane array that select between content in different registers of a same array location in between repeated simultaneous shifts of multiple rows or columns of data in the two dimensional shift register array.
-
公开(公告)号:US10789505B2
公开(公告)日:2020-09-29
申请号:US15631906
申请日:2017-06-23
Applicant: Google LLC
Inventor: Ofer Shacham , David Patterson , William R. Mark , Albert Meixner , Daniel Frederic Finchelstein , Jason Rupert Redgrave
Abstract: A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.
-
公开(公告)号:US10546211B2
公开(公告)日:2020-01-28
申请号:US15201204
申请日:2016-07-01
Applicant: Google LLC
Inventor: Ofer Shacham , David Patterson , William R. Mark , Albert Meixner , Daniel Frederic Finchelstein , Jason Rupert Redgrave
Abstract: A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The two-dimensional shift register provides local respective register space for the execution lanes. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.
-
公开(公告)号:US10531030B2
公开(公告)日:2020-01-07
申请号:US16448931
申请日:2019-06-21
Applicant: Google LLC
Inventor: Albert Meixner , Daniel Frederic Finchelstein , David Patterson , William R. Mark , Jason Rupert Redgrave , Ofer Shacham
Abstract: A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resident in respective locations of the two dimensional shift register array that the shifted first content has been shifted into.
-
17.
公开(公告)号:US20190327437A1
公开(公告)日:2019-10-24
申请号:US16448931
申请日:2019-06-21
Applicant: Google LLC
Inventor: Albert Meixner , Daniel Frederic Finchelstein , David Patterson , William R. Mark , Jason Rupert Redgrave , Ofer Shacham
Abstract: A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resident in respective locations of the two dimensional shift register array that the shifted first content has been shifted into.
-
公开(公告)号:US10397450B2
公开(公告)日:2019-08-27
申请号:US15590849
申请日:2017-05-09
Applicant: Google LLC
Inventor: Ofer Shacham , Jason Rupert Redgrave , Albert Meixner , Qiuling Zhu , Daniel Frederic Finchelstein , David Patterson , Donald Stark
Abstract: An apparatus is described. The apparatus includes an execution lane array coupled to a two dimensional shift register array structure. Locations in the execution lane array are coupled to same locations in the two-dimensional shift register array structure such that different execution lanes have different dedicated registers.
-
公开(公告)号:US10216487B2
公开(公告)日:2019-02-26
申请号:US15591984
申请日:2017-05-10
Applicant: Google LLC
Inventor: Albert Meixner , Ofer Shacham , David Patterson , Daniel Frederic Finchelstein , Qiuling Zhu , Jason Rupert Redgrave
Abstract: A method is described that includes instantiating, within an application software development environment, a virtual processor having an instruction set architecture and memory model that contemplate first and second regions of reserved memory. The first reserved region is to keep data of an input image array. The second reserved region is to keep data of an output image array. The method also includes simulating execution of a memory load instruction of the instruction set architecture by automatically targeting the first reserved region and identifying desired input data with first and second coordinates relative to the virtual processor's position within an orthogonal coordinate system and expressed in the instruction format of the memory load instruction.
-
公开(公告)号:US10095479B2
公开(公告)日:2018-10-09
申请号:US14694890
申请日:2015-04-23
Applicant: Google LLC
Inventor: Albert Meixner , Ofer Shacham , David Patterson , Daniel Frederic Finchelstein , Qiuling Zhu , Jason Rupert Redgrave
Abstract: A method is described that includes instantiating, within an application software development environment, a virtual processor having an instruction set architecture and memory model that contemplate first and second regions of reserved memory. The first reserved region is to keep data of an input image array. The second reserved region is to keep data of an output image array. The method also includes simulating execution of a memory load instruction of the instruction set architecture by automatically targeting the first reserved region and identifying desired input data with first and second coordinates relative to the virtual processor's position within an orthogonal coordinate system and expressed in the instruction format of the memory load instruction.
-
-
-
-
-
-
-
-
-