VECTOR REDUCTIONS USING SHARED SCRATCHPAD MEMORY

    公开(公告)号:WO2021173201A1

    公开(公告)日:2021-09-02

    申请号:PCT/US2020/062612

    申请日:2020-11-30

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.

    LOW LATENCY MATRIX MULTIPLY UNIT
    2.
    发明申请

    公开(公告)号:WO2018213635A1

    公开(公告)日:2018-11-22

    申请号:PCT/US2018/033270

    申请日:2018-05-17

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus for a matrix multiply unit implemented as a systolic array of cells are disclosed. The matrix multiply unit may include cells arranged in columns of the systolic array. Two chains of weight shift registers per column of the systolic array are in the matrix multiply unit. Each weight shift register is connected to only one chain and each cell is connected to only one weight shift register. A weight matrix register per cell is configured to store a weight input received from a weight shift register. A multiply unit is coupled to the weight matrix register and configured to multiply the weight input of the weight matrix register with a vector data input in order to obtain a multiplication result.

    INTEGRATED CIRCUIT WITH A RING-SHAPED HOT SPOT AREA AND MULTIDIRECTIONAL COOLING

    公开(公告)号:WO2020242522A1

    公开(公告)日:2020-12-03

    申请号:PCT/US2019/063527

    申请日:2019-11-27

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including an integrated circuit (IC) with a ring-shaped hot spot area are described. In one aspect, an IC includes a first area along an outside perimeter of the IC. The first area defines a first inner perimeter. The IC includes a second area that includes a center of the IC and that includes a first set of components. The second area defines a first outer perimeter. The IC includes a ring-shaped hot spot area between the first area and the second area. The ring-shaped hot spot area defines a ring outer perimeter that is juxtaposed with the first inner perimeter. The ring-shaped hot spot area defines a ring inner perimeter that is juxtaposed with the first outer perimeter. The ring-shaped hot spot area includes a second set of components that produce more heat than the first set of components.

    COOLING ELECTRONIC DEVICES IN A DATA CENTER
    5.
    发明申请

    公开(公告)号:WO2019204472A1

    公开(公告)日:2019-10-24

    申请号:PCT/US2019/027909

    申请日:2019-04-17

    Applicant: GOOGLE LLC

    Abstract: A server tray package includes a motherboard assembly that includes a plurality of data center electronic devices, the plurality of data center electronic devices including at least one heat generating processor device, and a liquid cold plate assembly. The liquid cold plate assembly includes a base portion mounted to the motherboard assembly, the base portion and motherboard assembly defining a volume that at least partially encloses the plurality of data center electronic devices; and a top portion mounted to the base portion and including a heat transfer member shaped to thermally contact the heat generating processor device, the heat transfer member including an inlet port and an outlet port that are in fluid communication with a cooling liquid flow path defined through the heat transfer member.

    PERFORMING MATRIX MULTIPLICATION IN HARDWARE

    公开(公告)号:WO2018213636A1

    公开(公告)日:2018-11-22

    申请号:PCT/US2018/033271

    申请日:2018-05-17

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.

    HARDWARE-AWARE PROGRESSIVE TRAINING OF MACHINE LEARNING MODELS

    公开(公告)号:WO2023059439A1

    公开(公告)日:2023-04-13

    申请号:PCT/US2022/044201

    申请日:2022-09-21

    Applicant: GOOGLE LLC

    Abstract: Aspects of the disclosure provide for hardware-aware progressive training of machine learning models. A training system trains a model in accordance with a training process and different values specified in a training schedule for both hardware-level and model-level performance settings. Hardware-level performance settings can cause hardware features of computing resources used to train the model to be enabled, disabled, or modified at various points during training. Model-level performance settings can take on a variety of values to adjust characteristics of the machine learning model being trained or of the training process, during different stages of training. The training system can identify and apply complementary values of hardware- and model-level performance settings to generate training schedules that improve model training speed at earlier stages of training, while improving model quality at later stages of training.

    CROSS REPLICA REDUCTION ON NETWORKS HAVING DEGRADED NODES

    公开(公告)号:WO2021034475A1

    公开(公告)日:2021-02-25

    申请号:PCT/US2020/044336

    申请日:2020-07-30

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.

    COOLING ELECTRONIC DEVICES IN A DATA CENTER
    10.
    发明申请

    公开(公告)号:WO2020146033A1

    公开(公告)日:2020-07-16

    申请号:PCT/US2019/058010

    申请日:2019-10-25

    Applicant: GOOGLE LLC

    Abstract: A server tray package includes a motherboard assembly that includes a plurality of data center electronic devices; and a liquid cold plate assembly. The liquid cold plate assembly includes a base portion mounted to the motherboard assembly, the base portion and motherboard assembly defining a volume that at least partially encloses the plurality of data center electronic devices; and a top portion mounted to the base portion and including a heat transfer member that includes a first number of inlet ports and a second number of outlet ports that are in fluid communication with a cooling liquid flow path defined through the heat transfer member, the first number of inlet ports being different that the second number of outlet ports.

Patent Agency Ranking