Patent search ap:("GOOGLE LLC") AND inv:"JOUPPI Page Norman Paul"

1.

发明申请
VECTOR REDUCTIONS USING SHARED SCRATCHPAD MEMORY 审中-公开

公开(公告)号：WO2021173201A1

公开(公告)日：2021-09-02

申请号：PCT/US2020/062612

申请日：2020-11-30

Applicant: GOOGLE LLC

Inventor： NORRIE, Thomas , RAJAMANI, Gurushankar , PHELPS, Andrew Everett , HEDLUND, Matthew Leever , JOUPPI, Norman Paul

IPC: G06N3/063 , G06N3/04 , G06F7/544 , G06F15/167

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.

2.

发明申请
LOW LATENCY MATRIX MULTIPLY UNIT 审中-公开

公开(公告)号：WO2018213635A1

公开(公告)日：2018-11-22

申请号：PCT/US2018/033270

申请日：2018-05-17

Applicant: GOOGLE LLC

Inventor： PHELPS, Andrew Everett , JOUPPI, Norman Paul

IPC: G06F17/16

Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

3.

发明申请
HARDWARE-OPTIMIZED NEURAL ARCHITECTURE SEARCH 审中-公开

公开(公告)号：WO2022015390A1

公开(公告)日：2022-01-20

申请号：PCT/US2021/029619

申请日：2021-04-28

Applicant: GOOGLE LLC

Inventor： LI, Sheng , JOUPPI, Norman Paul , LE, Quoc V. , TAN, Mingxing , PANG, Ruoming , CHENG, Liqun , LI, Andrew

IPC: G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.

4.

发明申请
INTEGRATED CIRCUIT WITH A RING-SHAPED HOT SPOT AREA AND MULTIDIRECTIONAL COOLING 审中-公开

公开(公告)号：WO2020242522A1

公开(公告)日：2020-12-03

申请号：PCT/US2019/063527

申请日：2019-11-27

Applicant: GOOGLE LLC

Inventor： IYENGAR, Madhusudan Krishnan , JOUPPI, Norman Paul , PADILLA, Jorge , MALONE, Christopher Gregory

IPC: H01L23/367 , H01L23/473 , H01L27/02 , H05K7/20

Abstract: Methods, systems, and apparatus, including an integrated circuit (IC) with a ring-shaped hot spot area are described. In one aspect, an IC includes a first area along an outside perimeter of the IC. The first area defines a first inner perimeter. The IC includes a second area that includes a center of the IC and that includes a first set of components. The second area defines a first outer perimeter. The IC includes a ring-shaped hot spot area between the first area and the second area. The ring-shaped hot spot area defines a ring outer perimeter that is juxtaposed with the first inner perimeter. The ring-shaped hot spot area defines a ring inner perimeter that is juxtaposed with the first outer perimeter. The ring-shaped hot spot area includes a second set of components that produce more heat than the first set of components.

5.

发明申请
COOLING ELECTRONIC DEVICES IN A DATA CENTER 审中-公开

公开(公告)号：WO2019204472A1

公开(公告)日：2019-10-24

申请号：PCT/US2019/027909

申请日：2019-04-17

Applicant: GOOGLE LLC

Inventor： IYENGAR, Madhusudan Krishnan , MALONE, Christopher Gergory , LI, Yuan , PADILLA, Jorge , KWON, Woon-Seong , KANG, Teckgyu , JOUPPI, Norman Paul

IPC: H05K7/20

Abstract: A server tray package includes a motherboard assembly that includes a plurality of data center electronic devices, the plurality of data center electronic devices including at least one heat generating processor device, and a liquid cold plate assembly. The liquid cold plate assembly includes a base portion mounted to the motherboard assembly, the base portion and motherboard assembly defining a volume that at least partially encloses the plurality of data center electronic devices; and a top portion mounted to the base portion and including a heat transfer member shaped to thermally contact the heat generating processor device, the heat transfer member including an inlet port and an outlet port that are in fluid communication with a cooling liquid flow path defined through the heat transfer member.

6.

发明申请
PERFORMING MATRIX MULTIPLICATION IN HARDWARE 审中-公开

公开(公告)号：WO2018213636A1

公开(公告)日：2018-11-22

申请号：PCT/US2018/033271

申请日：2018-05-17

Applicant: GOOGLE LLC

Inventor： PHELPS, Andrew Everett , JOUPPI, Norman Paul

IPC: G06F7/487 , G06F9/30 , G06F17/16

Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.

7.

发明申请
HARDWARE-AWARE PROGRESSIVE TRAINING OF MACHINE LEARNING MODELS 审中-公开

公开(公告)号：WO2023059439A1

公开(公告)日：2023-04-13

申请号：PCT/US2022/044201

申请日：2022-09-21

Applicant: GOOGLE LLC

Inventor： LI, Sheng , TAN, Mingxing , JOUPPI, Norman Paul , LE, Quoc V. , CHENG, Liqun , PANG, Ruoming , RANGANATHAN, Parthasarathy

IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F9/50 , G06N20/10 , G06N3/044 , G06N3/045

Abstract: Aspects of the disclosure provide for hardware-aware progressive training of machine learning models. A training system trains a model in accordance with a training process and different values specified in a training schedule for both hardware-level and model-level performance settings. Hardware-level performance settings can cause hardware features of computing resources used to train the model to be enabled, disabled, or modified at various points during training. Model-level performance settings can take on a variety of values to adjust characteristics of the machine learning model being trained or of the training process, during different stages of training. The training system can identify and apply complementary values of hardware- and model-level performance settings to generate training schedules that improve model training speed at earlier stages of training, while improving model quality at later stages of training.

8.

发明申请
SHARED SCRATCHPAD MEMORY WITH PARALLEL LOAD-STORE 审中-公开

公开(公告)号：WO2021154732A1

公开(公告)日：2021-08-05

申请号：PCT/US2021/015097

申请日：2021-01-26

Applicant: GOOGLE LLC , NORRIE, Thomas , PHELPS, Andrew Everett , JOUPPI, Norman Paul , HEDLUND, Matthew Leever

Inventor： NORRIE, Thomas , PHELPS, Andrew Everett , JOUPPI, Norman Paul , HEDLUND, Matthew Leever

IPC: G06N3/063 , G06F7/544 , G06F15/167 , G06T1/20 , G06T1/60 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for a hardware circuit configured to implement a neural network. The circuit includes a first memory, respective first and second processor cores, and a shared memory. The first memory provides data for performing computations to generate an output for a neural network layer. Each of the first and second cores include a vector memory for storing vector values derived from the data provided by the first memory. The shared memory is disposed generally intermediate the first memory and at least one core and includes: i) a direct memory access (DMA) data path configured to route data between the shared memory and the respective vector memories of the first and second cores and ii) a load-store data path configured to route data between the shared memory and respective vector registers of the first and second cores.

9.

发明申请
CROSS REPLICA REDUCTION ON NETWORKS HAVING DEGRADED NODES 审中-公开

公开(公告)号：WO2021034475A1

公开(公告)日：2021-02-25

申请号：PCT/US2020/044336

申请日：2020-07-30

Applicant: GOOGLE LLC

Inventor： ROUNE, Bjarke Hammersholt , KUMAR, Sameer , JOUPPI, Norman Paul

IPC: G06N3/08 , G06F15/173 , G06N20/00

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.

10.

发明申请
COOLING ELECTRONIC DEVICES IN A DATA CENTER 审中-公开

公开(公告)号：WO2020146033A1

公开(公告)日：2020-07-16

申请号：PCT/US2019/058010

申请日：2019-10-25

Applicant: GOOGLE LLC

Inventor： MALONE, Christopher Gregory , IYENGAR, Madhusudan Krishnan , LI, Yuan , PADILLA, Jorge , KWON, Woon Seong , KANG, Teckgyu , JOUPPI, Norman Paul

IPC: H05K7/20

Abstract: A server tray package includes a motherboard assembly that includes a plurality of data center electronic devices; and a liquid cold plate assembly. The liquid cold plate assembly includes a base portion mounted to the motherboard assembly, the base portion and motherboard assembly defining a volume that at least partially encloses the plurality of data center electronic devices; and a top portion mounted to the base portion and including a heat transfer member that includes a first number of inlet ports and a second number of outlet ports that are in fluid communication with a cooling liquid flow path defined through the heat transfer member, the first number of inlet ports being different that the second number of outlet ports.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification