-
公开(公告)号:WO2021173201A1
公开(公告)日:2021-09-02
申请号:PCT/US2020/062612
申请日:2020-11-30
Applicant: GOOGLE LLC
Inventor: NORRIE, Thomas , RAJAMANI, Gurushankar , PHELPS, Andrew Everett , HEDLUND, Matthew Leever , JOUPPI, Norman Paul
IPC: G06N3/063 , G06N3/04 , G06F7/544 , G06F15/167
Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.
-
公开(公告)号:WO2018213635A1
公开(公告)日:2018-11-22
申请号:PCT/US2018/033270
申请日:2018-05-17
Applicant: GOOGLE LLC
Inventor: PHELPS, Andrew Everett , JOUPPI, Norman Paul
IPC: G06F17/16
Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.
-
公开(公告)号:WO2022015390A1
公开(公告)日:2022-01-20
申请号:PCT/US2021/029619
申请日:2021-04-28
Applicant: GOOGLE LLC
Inventor: LI, Sheng , JOUPPI, Norman Paul , LE, Quoc V. , TAN, Mingxing , PANG, Ruoming , CHENG, Liqun , LI, Andrew
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.
-
公开(公告)号:WO2020242522A1
公开(公告)日:2020-12-03
申请号:PCT/US2019/063527
申请日:2019-11-27
Applicant: GOOGLE LLC
Inventor: IYENGAR, Madhusudan Krishnan , JOUPPI, Norman Paul , PADILLA, Jorge , MALONE, Christopher Gregory
IPC: H01L23/367 , H01L23/473 , H01L27/02 , H05K7/20
Abstract: Methods, systems, and apparatus, including an integrated circuit (IC) with a ring-shaped hot spot area are described. In one aspect, an IC includes a first area along an outside perimeter of the IC. The first area defines a first inner perimeter. The IC includes a second area that includes a center of the IC and that includes a first set of components. The second area defines a first outer perimeter. The IC includes a ring-shaped hot spot area between the first area and the second area. The ring-shaped hot spot area defines a ring outer perimeter that is juxtaposed with the first inner perimeter. The ring-shaped hot spot area defines a ring inner perimeter that is juxtaposed with the first outer perimeter. The ring-shaped hot spot area includes a second set of components that produce more heat than the first set of components.
-
公开(公告)号:WO2019204472A1
公开(公告)日:2019-10-24
申请号:PCT/US2019/027909
申请日:2019-04-17
Applicant: GOOGLE LLC
Inventor: IYENGAR, Madhusudan Krishnan , MALONE, Christopher Gergory , LI, Yuan , PADILLA, Jorge , KWON, Woon-Seong , KANG, Teckgyu , JOUPPI, Norman Paul
IPC: H05K7/20
Abstract: A server tray package includes a motherboard assembly that includes a plurality of data center electronic devices, the plurality of data center electronic devices including at least one heat generating processor device, and a liquid cold plate assembly. The liquid cold plate assembly includes a base portion mounted to the motherboard assembly, the base portion and motherboard assembly defining a volume that at least partially encloses the plurality of data center electronic devices; and a top portion mounted to the base portion and including a heat transfer member shaped to thermally contact the heat generating processor device, the heat transfer member including an inlet port and an outlet port that are in fluid communication with a cooling liquid flow path defined through the heat transfer member.
-
公开(公告)号:WO2018213636A1
公开(公告)日:2018-11-22
申请号:PCT/US2018/033271
申请日:2018-05-17
Applicant: GOOGLE LLC
Inventor: PHELPS, Andrew Everett , JOUPPI, Norman Paul
Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.
-
公开(公告)号:WO2023059439A1
公开(公告)日:2023-04-13
申请号:PCT/US2022/044201
申请日:2022-09-21
Applicant: GOOGLE LLC
Inventor: LI, Sheng , TAN, Mingxing , JOUPPI, Norman Paul , LE, Quoc V. , CHENG, Liqun , PANG, Ruoming , RANGANATHAN, Parthasarathy
Abstract: Aspects of the disclosure provide for hardware-aware progressive training of machine learning models. A training system trains a model in accordance with a training process and different values specified in a training schedule for both hardware-level and model-level performance settings. Hardware-level performance settings can cause hardware features of computing resources used to train the model to be enabled, disabled, or modified at various points during training. Model-level performance settings can take on a variety of values to adjust characteristics of the machine learning model being trained or of the training process, during different stages of training. The training system can identify and apply complementary values of hardware- and model-level performance settings to generate training schedules that improve model training speed at earlier stages of training, while improving model quality at later stages of training.
-
公开(公告)号:WO2021154732A1
公开(公告)日:2021-08-05
申请号:PCT/US2021/015097
申请日:2021-01-26
Applicant: GOOGLE LLC , NORRIE, Thomas , PHELPS, Andrew Everett , JOUPPI, Norman Paul , HEDLUND, Matthew Leever
Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes a first memory, respective first and second processor cores, and a shared memory. The first memory provides data for performing computations to generate an output for a neural network layer. Each of the first and second cores include a vector memory for storing vector values derived from the data provided by the first memory. The shared memory is disposed generally intermediate the first memory and at least one core and includes: i) a direct memory access (DMA) data path configured to route data between the shared memory and the respective vector memories of the first and second cores and ii) a load-store data path configured to route data between the shared memory and respective vector registers of the first and second cores.
-
公开(公告)号:WO2021034475A1
公开(公告)日:2021-02-25
申请号:PCT/US2020/044336
申请日:2020-07-30
Applicant: GOOGLE LLC
Inventor: ROUNE, Bjarke Hammersholt , KUMAR, Sameer , JOUPPI, Norman Paul
IPC: G06N3/08 , G06F15/173 , G06N20/00
Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.
-
公开(公告)号:WO2020146033A1
公开(公告)日:2020-07-16
申请号:PCT/US2019/058010
申请日:2019-10-25
Applicant: GOOGLE LLC
Inventor: MALONE, Christopher Gregory , IYENGAR, Madhusudan Krishnan , LI, Yuan , PADILLA, Jorge , KWON, Woon Seong , KANG, Teckgyu , JOUPPI, Norman Paul
IPC: H05K7/20
Abstract: A server tray package includes a motherboard assembly that includes a plurality of data center electronic devices; and a liquid cold plate assembly. The liquid cold plate assembly includes a base portion mounted to the motherboard assembly, the base portion and motherboard assembly defining a volume that at least partially encloses the plurality of data center electronic devices; and a top portion mounted to the base portion and including a heat transfer member that includes a first number of inlet ports and a second number of outlet ports that are in fluid communication with a cooling liquid flow path defined through the heat transfer member, the first number of inlet ports being different that the second number of outlet ports.
-
-
-
-
-
-
-
-
-