-
公开(公告)号:US20240385873A1
公开(公告)日:2024-11-21
申请号:US18667501
申请日:2024-05-17
Applicant: Google LLC
Inventor: Jiafan Zhu , Jianqiao Liu , Xiangyu Dong , Xiao Zhang , Jikai Tang , Kexin Yang , Yong Zhao , Alireza Ghaffarkhah , Arash Rezaei , Dayou Du , Yazhou Zu , Xiangling Kong , Hoang-Vu Dang , Alexander Vadimovich Kolbasov
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.
-
公开(公告)号:US12020063B2
公开(公告)日:2024-06-25
申请号:US17540123
申请日:2021-12-01
Applicant: Google LLC
Inventor: Jiafan Zhu , Jianqiao Liu , Xiangyu Dong , Xiao Zhang , Jikai Tang , Kexin Yang , Yong Zhao , Alireza Ghaffarkhah , Arash Rezaei , Dayou Du , Yazhou Zu , Xiangling Kong , Hoang-Vu Dang , Alexander Vadimovich Kolbasov
CPC classification number: G06F9/4843 , G06F9/5027 , G06F11/3024 , G06F11/3433
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.
-
公开(公告)号:US20230168919A1
公开(公告)日:2023-06-01
申请号:US17540123
申请日:2021-12-01
Applicant: Google LLC
Inventor: Jiafan Zhu , Jianqiao Liu , Xiangyu Dong , Xiao Zhang , Jikai Tang , Kexin Yang , Yong Zhao , Alireza Ghaffarkhah , Arash Rezaei , Dayou Du , Yazhou Zu , Xiangling Kong , Hoang-Vu Dang , Alexander Vadimovich Kolbasov
CPC classification number: G06F9/4843 , G06F9/5027 , G06F11/3433 , G06F11/3024
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing preflight checks of a distributed computing system, are described. In one aspect, a method includes assigning a computing workload to a first subset of hardware accelerator machines each having one or more hardware accelerators. A preflight check on the first subset is performed before performing the computing workload to verify the functionality of each machine in the first subset. For each hardware accelerator machine of the first subset, a program code package is installed, including a task action based at least in part on characteristics of the computing workload. The task action including a sequence of operations is performed on the hardware accelerator machine to determine whether the task action fails. Whenever the task action fails, the computing workload is re-assigned to a second subset of hardware accelerator machines different from the first subset.
-
-