-
公开(公告)号:US20250077833A1
公开(公告)日:2025-03-06
申请号:US18821971
申请日:2024-08-30
Applicant: Google LLC
Inventor: Sheng Li , Norman Paul Jouppi , Quoc V. Le , Mingxing Tan , Ruoming Pang , Liqun Cheng , Andrew Li
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.
-
公开(公告)号:US12131244B2
公开(公告)日:2024-10-29
申请号:US17039178
申请日:2020-09-30
Applicant: Google LLC
Inventor: Sheng Li , Norman Paul Jouppi , Quoc V. Le , Mingxing Tan , Ruoming Pang , Liqun Cheng , Andrew Li
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.
-
公开(公告)号:US20240037373A1
公开(公告)日:2024-02-01
申请号:US17875594
申请日:2022-07-28
Applicant: Google LLC
Inventor: Sheng Li , Norman Paul Jouppi , Garrett Axel Andersen , Quoc V. Le , Liqun Cheng , Parthasarathy Ranganathan
CPC classification number: G06N3/0454 , G06N3/063
Abstract: Aspects of the disclosure are directed to jointly searching machine learning model architectures and hardware architectures in a combined space of models, hardware, and mapping strategies. A search strategy is utilized where all models, hardware, and mappings are evaluated together at once via weight sharing and a supernetwork. A multi-objective reward function is utilized with objectives for quality, performance, power, and area.
-
公开(公告)号:US20220230048A1
公开(公告)日:2022-07-21
申请号:US17175029
申请日:2021-02-12
Applicant: Google LLC
Inventor: Andrew Li , Sheng Li , Mingxing Tan , Ruoming Pang , Liqun Cheng , Quoc V. Le , Norman Paul Jouppi
Abstract: Methods, systems, and apparatus, including computer-readable media, for scaling neural network architectures on hardware accelerators. A method includes receiving training data and information specifying target computing resources, and performing using the training data, a neural architecture search over a search space to identify an architecture for a base neural network. A plurality of scaling parameter values for scaling the base neural network can be identified, which can include repeatedly selecting a plurality of candidate scaling parameter values, and determining a measure of performance for the base neural network scaled according to the plurality of candidate scaling parameter values, in accordance with a plurality of second objectives including a latency objective. An architecture for a scaled neural network can be determined using the architecture of the base neural network scaled according to the plurality of scaling parameter values.
-
公开(公告)号:US20240231667A1
公开(公告)日:2024-07-11
申请号:US18152428
申请日:2023-01-10
Applicant: Google LLC
Inventor: Sheng Li , Sridhar Lakshmanamurthy , Norman Paul Jouppi , Martin Guy Dixon , Daniel Stodolsky , Quoc V. Le , Liqun Cheng , Erik Karl Norden , Parthasarathy Ranganathan
IPC: G06F3/06
CPC classification number: G06F3/0647 , G06F3/0611 , G06F3/067
Abstract: Aspects of the disclosure are directed to a heterogeneous machine learning accelerator system with compute and memory nodes connected by high speed chip-to-chip interconnects. While existing remote/disaggregated memory may require memory expansion via remote processing units, aspects of the disclosure add memory nodes into machine learning accelerator clusters via the chip-to-chip interconnects without needing assistance from remote processing units to achieve higher performance, simpler software stack, and/or lower cost. The memory nodes may support prefetch and intelligent compression to enable the use of low cost memory without performance degradation.
-
公开(公告)号:US11960936B2
公开(公告)日:2024-04-16
申请号:US17150285
申请日:2021-01-15
Applicant: Google LLC
Inventor: David Lo , Liqun Cheng , Parthasarathy Ranganathan , Sundar Jayakumar Dev
CPC classification number: G06F9/5027 , G06N20/00
Abstract: The subject matter described herein provides systems and techniques to address the challenges of growing hardware and workload heterogeneity using a Warehouse-Scale Computer (WSC) design that improves the efficiency and utilization of WSCs. The WSC design may include an abstraction layer and an efficiency layer in the software stack of the WSC. The abstraction layer and the efficiency layer may be designed to improve job scheduling, simplify resource management, and drive hardware-software co-optimization using machine learning techniques and automation in order to customize the WSC for applications at scale. The abstraction layer may embrace platform/hardware and workload diversity through greater coordination between hardware and higher layers of the WSC software stack in the WSC design. The efficiency layer may employ machine learning techniques at scale to realize hardware/software co-optimizations as a part of the autonomous WSC design.
-
公开(公告)号:US11544105B2
公开(公告)日:2023-01-03
申请号:US16600437
申请日:2019-10-11
Applicant: Google LLC
Inventor: Sheng Li , Brian Zhang , Liqun Cheng , Norman Paul Jouppi , Yun Ni
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.
-
公开(公告)号:US20220019869A1
公开(公告)日:2022-01-20
申请号:US17039178
申请日:2020-09-30
Applicant: Google LLC
Inventor: Sheng Li , Norman Paul Jouppi , Quoc V. Le , Mingxing Tan , Ruoming Pang , Liqun Cheng , Andrew Li
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.
-
公开(公告)号:US11188494B2
公开(公告)日:2021-11-30
申请号:US16524964
申请日:2019-07-29
Applicant: Google LLC
Inventor: Nishant Patil , Liqun Cheng
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, are described for performing asymmetric data communication at a host-device interface of a system. The methods include identifying devices coupled to a host of the system and generating a system topology that identifies a connectivity of the devices and identifies bus lanes that enable data transfers at the system. The host determines that a first connection between the host and a first device of the multiple devices has an asymmetric bandwidth requirement. The host configures a set of bus lanes of a data bus connecting the first device and the host to allocate a different number of the bus lanes to data egress from the host than to data ingress to the host. The bus lanes are configured to allocate the differing number of bus lanes based on the asymmetric bandwidth requirement of the first connection.
-
公开(公告)号:US20190163381A1
公开(公告)日:2019-05-30
申请号:US16242669
申请日:2019-01-08
Applicant: Google LLC
Inventor: Rama Krishna Govindaraju , Liqun Cheng , Parthasarathy Ranganathan , Michael R. Marty , Andrew Gallatin
IPC: G06F3/06 , G06F12/1081
Abstract: An example method includes during execution of a software application by a processor, receiving, by a copy processor separate from the processor, a request for an asynchronous data copy operation to copy data within a memory accessible by the copy processor, wherein the request is received from a copy manager accessible by the software application in a user space of an operating system managing execution of the software application; in response to the request, initiating, by the copy processor, the asynchronous data copy operation; continuing execution of the software application by the processor; determining, by the copy processor, that the asynchronous data copy operation has completed; and in response to determining that the asynchronous copy operation has completed, selectively notifying, by the copy processor, the software application that the asynchronous copy operation has completed.
-
-
-
-
-
-
-
-
-