-
公开(公告)号:US12020063B2
公开(公告)日:2024-06-25
申请号:US17540123
申请日:2021-12-01
Applicant: Google LLC
Inventor: Jiafan Zhu , Jianqiao Liu , Xiangyu Dong , Xiao Zhang , Jikai Tang , Kexin Yang , Yong Zhao , Alireza Ghaffarkhah , Arash Rezaei , Dayou Du , Yazhou Zu , Xiangling Kong , Hoang-Vu Dang , Alexander Vadimovich Kolbasov
CPC classification number: G06F9/4843 , G06F9/5027 , G06F11/3024 , G06F11/3433
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.
-
公开(公告)号:US11966273B2
公开(公告)日:2024-04-23
申请号:US18173293
申请日:2023-02-23
Applicant: Google LLC
Inventor: Vasileios Kontorinis , Shaohong Li , Xiao Zhang , Sreekumar Vadakke Kodakara , Kunqi Ye
CPC classification number: G06F1/329 , G06F9/4893 , G06F9/5094
Abstract: This disclosure describes a method to minimize disruption for throughput oriented jobs in power oversubscription services with a dynamic control. The mechanism controls power in a hardware-agnostic way, and the policy employs a multi-threshold approach that balances power safety with workload impact. Moreover, an alternative control mechanism ensures proper system operation while power measurements are unavailable.
-
公开(公告)号:US11599184B2
公开(公告)日:2023-03-07
申请号:US17243853
申请日:2021-04-29
Applicant: Google LLC
Inventor: Vasileios Kontorinis , Shaohong Li , Xiao Zhang , Sreekumar Vadakke Kodakara , Kunqi Ye
Abstract: This disclosure describes a method to minimize disruption for throughput oriented jobs in power oversubscription services with a dynamic control. The mechanism controls power in a hardware-agnostic way, and the policy employs a multi-threshold approach that balances power safety with workload impact. Moreover, an alternative control mechanism ensures proper system operation while power measurements are unavailable.
-
公开(公告)号:US20200090039A1
公开(公告)日:2020-03-19
申请号:US16494842
申请日:2017-11-17
Applicant: Google LLC
Inventor: Yang Song , Yuan Li , Bo Wu , Chao-Yeh Chen , Xiao Zhang , Hartwig Adam
Abstract: A computer-implemented method for generating a unified machine learning model using a neural network on a data processing apparatus is described. The method includes the data processing apparatus determining respective learning targets for each of a plurality of object verticals. The data processing apparatus determines the respective learning targets based on two or more embedding outputs of the neural network. The method also includes the data processing apparatus training the neural network to identify data associated with each of the plurality of object verticals. The data processing apparatus trains the neural network using the respective learning targets and based on a first loss function. The data processing apparatus uses the neural network trained to generate a unified machine learning model, where the model is configured to identify particular data items associated with each of the plurality of object verticals.
-
公开(公告)号:US20230305618A1
公开(公告)日:2023-09-28
申请号:US18173293
申请日:2023-02-23
Applicant: Google LLC
Inventor: Vasileios Kontorinis , Shaohong Li , Xiao Zhang , Sreekumar Vadakke Kodakara , Kunqi Ye
CPC classification number: G06F1/329 , G06F9/4893 , G06F9/5094
Abstract: This disclosure describes a method to minimize disruption for throughput oriented jobs in power oversubscription services with a dynamic control. The mechanism controls power in a hardware-agnostic way, and the policy employs a multi-threshold approach that balances power safety with workload impact. Moreover, an alternative control mechanism ensures proper system operation while power measurements are unavailable.
-
公开(公告)号:US20230168919A1
公开(公告)日:2023-06-01
申请号:US17540123
申请日:2021-12-01
Applicant: Google LLC
Inventor: Jiafan Zhu , Jianqiao Liu , Xiangyu Dong , Xiao Zhang , Jikai Tang , Kexin Yang , Yong Zhao , Alireza Ghaffarkhah , Arash Rezaei , Dayou Du , Yazhou Zu , Xiangling Kong , Hoang-Vu Dang , Alexander Vadimovich Kolbasov
CPC classification number: G06F9/4843 , G06F9/5027 , G06F11/3433 , G06F11/3024
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.
-
公开(公告)号:US20210373639A1
公开(公告)日:2021-12-02
申请号:US17243853
申请日:2021-04-29
Applicant: Google LLC
Inventor: Vasileios Kontorinis , Shaohong Li , Xiao Zhang , Sreekumar Vadakke Kodakara , Kunqi Ye
Abstract: This disclosure describes a method to minimize disruption for throughput oriented jobs in power oversubscription services with a dynamic control. The mechanism controls power in a hardware-agnostic way, and the policy employs a multi-threshold approach that balances power safety with workload impact. Moreover, an alternative control mechanism ensures proper system operation while power measurements are unavailable.
-
公开(公告)号:US20240385873A1
公开(公告)日:2024-11-21
申请号:US18667501
申请日:2024-05-17
Applicant: Google LLC
Inventor: Jiafan Zhu , Jianqiao Liu , Xiangyu Dong , Xiao Zhang , Jikai Tang , Kexin Yang , Yong Zhao , Alireza Ghaffarkhah , Arash Rezaei , Dayou Du , Yazhou Zu , Xiangling Kong , Hoang-Vu Dang , Alexander Vadimovich Kolbasov
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.
-
-
-
-
-
-
-