-
公开(公告)号:US20240062062A1
公开(公告)日:2024-02-22
申请号:US18376362
申请日:2023-10-03
Applicant: Google LLC
Inventor: Samuel Bengio , Mohammad Norouzi , Benoit Steiner , Jeffrey Adgate Dean , Hieu Hy Pham , Azalia Mirhoseini , Quoc V. Le , Naveen Kumar , Yuefeng Zhou , Rasmus Munk Larsen
Abstract: A method for determining a placement for machine learning model operations across multiple hardware devices is described. The method includes receiving data specifying a machine learning model to be placed for distributed processing on multiple hardware devices; generating, from the data, a sequence of operation embeddings, each operation embedding in the sequence characterizing respective operations necessary to perform the processing of the machine learning model; processing the sequence of operation embeddings using a placement recurrent neural network in accordance with first values of a plurality network parameters of the placement recurrent neural network to generate a network output that defines a placement of the operations characterized by the operation embeddings in the sequence across the plurality of devices; and scheduling the machine learning model for processing by the multiple hardware devices by placing the operations on the multiple devices according to the placement defined by the network output.
-
公开(公告)号:US20220188636A1
公开(公告)日:2022-06-16
申请号:US17551065
申请日:2021-12-14
Applicant: Google LLC
Inventor: Hieu Hy Pham , Zihang Dai , Qizhe Xie , Quoc V. Le
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.
-
公开(公告)号:US11803747B2
公开(公告)日:2023-10-31
申请号:US16878720
申请日:2020-05-20
Applicant: Google LLC
Inventor: Samuel Bengio , Mohammad Norouzi , Benoit Steiner , Jeffrey Adgate Dean , Hieu Hy Pham , Azalia Mirhoseini , Quoc V. Le , Naveen Kumar , Yuefeng Zhou , Rasmus Munk Larsen
Abstract: A method for determining a placement for machine learning model operations across multiple hardware devices is described. The method includes receiving data specifying a machine learning model to be placed for distributed processing on multiple hardware devices; generating, from the data, a sequence of operation embeddings, each operation embedding in the sequence characterizing respective operations necessary to perform the processing of the machine learning model; processing the sequence of operation embeddings using a placement recurrent neural network in accordance with first values of a plurality network parameters of the placement recurrent neural network to generate a network output that defines a placement of the operations characterized by the operation embeddings in the sequence across the plurality of devices; and scheduling the machine learning model for processing by the multiple hardware devices by placing the operations on the multiple devices according to the placement defined by the network output.
-
公开(公告)号:US10438113B2
公开(公告)日:2019-10-08
申请号:US16040186
申请日:2018-07-19
Applicant: Google LLC
Inventor: Benoit Steiner , Anna Darling Goldie , Jeffrey Adgate Dean , Hieu Hy Pham , Azalia Mirhoseini , Quoc V. Le
Abstract: A method for determining a placement for machine learning model operations across multiple hardware devices includes receiving data specifying machine learning operations, and determining a placement that assigns each of the operations specified by the data to a respective device from the multiple hardware devices. Determining the placement includes: generating, from the data, a respective operation embedding for each of the operations; grouping the operations into multiple operation groups, comprising processing each of the respective operation embeddings using a grouper neural network having multiple grouper parameters, in which the grouper neural network is configured to, for each of the operations, process the operation embedding for the operation in accordance with first values of the grouper parameters to generate a grouper output that assigns the operation to an operation group from the multiple operation groups; and assigning each of the operation groups to a respective device from the multiple hardware devices.
-
公开(公告)号:US20250131321A1
公开(公告)日:2025-04-24
申请号:US18489503
申请日:2023-10-18
Applicant: Google LLC
Inventor: Wei Yu , Sang Xie , Hieu Hy Pham , Quoc V. Le
IPC: G06N20/00
Abstract: Systems and methods are provided for efficiently calibrating a data mixture for training machine-learned models (e.g., machine-learned sequence processing models, such as transformer-based models). For example, machine-learned models can be trained over a broad dataset that can include multiple different categories of data. The mixture of data categories within the dataset can influence model performance. To improve the performance of machine-learned models, example implementations of the present disclosure can learn a distribution of data categories using a lightweight proxy model before initiating training of a large primary model. In this manner, for instance, example implementations can obtain an improved training data distribution with less computational expense and can leverage the learned training data distribution to better train a large primary model.
-
公开(公告)号:US11455514B2
公开(公告)日:2022-09-27
申请号:US16554217
申请日:2019-08-28
Applicant: Google LLC
Inventor: Benoit Steiner , Anna Darling Goldie , Jeffrey Adgate Dean , Hieu Hy Pham , Azalia Mirhoseini , Quoc V. Le
Abstract: A method for determining a placement for machine learning model operations across multiple hardware devices includes receiving data specifying machine learning operations, and determining a placement that assigns each of the operations specified by the data to a respective device from the multiple hardware devices. Determining the placement includes: generating, from the data, a respective operation embedding for each of the operations; grouping the operations into multiple operation groups, comprising processing each of the respective operation embeddings using a grouper neural network having multiple grouper parameters, in which the grouper neural network is configured to, for each of the operations, process the operation embedding for the operation in accordance with first values of the grouper parameters to generate a grouper output that assigns the operation to an operation group from the multiple operation groups; and assigning each of the operation groups to a respective device from the multiple hardware devices.
-
公开(公告)号:US20210232929A1
公开(公告)日:2021-07-29
申请号:US17232803
申请日:2021-04-16
Applicant: Google LLC
Inventor: Barret Zoph , Yun Jia Guan , Hieu Hy Pham , Quoc V. Le
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes generating, using a controller neural network, a batch of output sequences, each output sequence in the batch specifying a respective subset of a plurality of components of a large neural network that should be active during the processing of inputs by the large neural network; for each output sequence in the batch: determining a performance metric of the large neural network on the particular neural network task (i) in accordance with current values of the large network parameters and (ii) with only the subset of components specified by the output sequences active; and using the performance metrics for the output sequences in the batch to adjust the current values of the controller parameters of the controller neural network.
-
公开(公告)号:US10692003B2
公开(公告)日:2020-06-23
申请号:US16445330
申请日:2019-06-19
Applicant: Google LLC
Inventor: Samuel Bengio , Mohammad Norouzi , Benoit Steiner , Jeffrey Adgate Dean , Hieu Hy Pham , Azalia Mirhoseini , Quoc V. Le , Naveen Kumar , Yuefeng Zhou , Rasmus Munk Larsen
Abstract: A method for determining a placement for machine learning model operations across multiple hardware devices is described. The method includes receiving data specifying a machine learning model to be placed for distributed processing on multiple hardware devices; generating, from the data, a sequence of operation embeddings, each operation embedding in the sequence characterizing respective operations necessary to perform the processing of the machine learning model; processing the sequence of operation embeddings using a placement recurrent neural network in accordance with first values of a plurality network parameters of the placement recurrent neural network to generate a network output that defines a placement of the operations characterized by the operation embeddings in the sequence across the plurality of devices; and scheduling the machine learning model for processing by the multiple hardware devices by placing the operations on the multiple devices according to the placement defined by the network output.
-
公开(公告)号:US20230154161A1
公开(公告)日:2023-05-18
申请号:US17988655
申请日:2022-11-16
Applicant: Google LLC
Inventor: Hieu Hy Pham , Zihang Dai , Golnaz Ghiasi , Hanxiao Liu , Wei Yu , Mingxing Tan , Quoc V. Le
IPC: G06V10/774 , G06V10/776 , G06F40/126 , G06V10/82 , G06T9/00 , G06V10/764
CPC classification number: G06V10/774 , G06V10/776 , G06F40/126 , G06V10/82 , G06T9/002 , G06V10/764
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using memory-optimized contrastive learning to train image encoder and text encoder neural networks.
-
公开(公告)号:US10984319B2
公开(公告)日:2021-04-20
申请号:US16859781
申请日:2020-04-27
Applicant: Google LLC
Inventor: Barret Zoph , Yun Jia Guan , Hieu Hy Pham , Quoc V. Le
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes generating, using a controller neural network, a batch of output sequences, each output sequence in the batch specifying a respective subset of a plurality of components of a large neural network that should be active during the processing of inputs by the large neural network; for each output sequence in the batch: determining a performance metric of the large neural network on the particular neural network task (i) in accordance with current values of the large network parameters and (ii) with only the subset of components specified by the output sequences active; and using the performance metrics for the output sequences in the batch to adjust the current values of the controller parameters of the controller neural network.
-
-
-
-
-
-
-
-
-