ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS

Invention Application

US20230118303A1 ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS 有权

Please log in to see more content

Patent Title: ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS
Application No.: US18082415

Application Date: 2022-12-15
Publication No.: US20230118303A1

Publication Date: 2023-04-20
Inventor: Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G06F9/48
IPC: G06F9/48 ; G06N3/063 ; G06N3/08

ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.

Public/Granted literature

US12112198B2 Asynchronous distributed data flow for machine learning workloads Public/Granted day:2024-10-08

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F9/00	程序控制装置，例如，控制单元（用于外部设备的程序控制入G06F13/10）
G06F9/06	.应用存入的程序的，即应用处理设备的内部存储来接收程序并保持程序的
G06F9/46	..多道程序装置
G06F9/48	...程序启动；程序切换，例如通过中断