-
公开(公告)号:US12197959B2
公开(公告)日:2025-01-14
申请号:US18036506
申请日:2020-12-21
Applicant: Google LLC
Inventor: Temitayo Fadelu , Ravi Narayanaswami , JiHong Min , Dongdong Li , Suyog Gupta , Jason Jong Kyu Park
Abstract: The present disclosure describes a system and method for preempting a long-running process with a higher priority process in a machine learning system, such as a hardware accelerator. The machine learning hardware accelerator can be a multi-chip system including semiconductor chips that can be application-specific integrated circuits (ASIC) designed to perform machine learning operations. An ASIC is an integrated circuit (IC) that is customized for a particular use.
-
公开(公告)号:US11816480B2
公开(公告)日:2023-11-14
申请号:US17892807
申请日:2022-08-22
Applicant: Google LLC
Inventor: Olivier Temam , Ravi Narayanaswami , Harshit Khaitan , Dong Hyuk Woo
CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30065 , G06F9/3824 , G06F13/28 , G06N3/04 , G06N3/045 , G06N3/063
Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
-
公开(公告)号:US11816045B2
公开(公告)日:2023-11-14
申请号:US17410071
申请日:2021-08-24
Applicant: Google LLC
Inventor: Dong Hyuk Woo , Ravi Narayanaswami
IPC: G06F13/16 , G06N20/00 , G06N3/10 , G06F15/76 , G06F9/38 , G06N20/10 , G06N3/045 , G06F17/16 , G06N5/04 , G06N3/063 , G06N3/08
CPC classification number: G06F13/1668 , G06F9/38 , G06F15/76 , G06F17/16 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/10 , G06N5/04 , G06N20/00 , G06N20/10 , Y02D10/00
Abstract: A computer-implemented method includes receiving, by a computing device, input activations and determining, by a controller of the computing device, whether each of the input activations has either a zero value or a non-zero value. The method further includes storing, in a memory bank of the computing device, at least one of the input activations. Storing the at least one input activation includes generating an index comprising one or more memory address locations that have input activation values that are non-zero values. The method still further includes providing, by the controller and from the memory bank, at least one input activation onto a data bus that is accessible by one or more units of a computational array. The activations are provided, at least in part, from a memory address location associated with the index.
-
公开(公告)号:US11586701B2
公开(公告)日:2023-02-21
申请号:US17063813
申请日:2020-10-06
Applicant: Google LLC
Inventor: Anand Suresh Kane , Ravi Narayanaswami
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a circuit configured to add multiple inputs. The circuit includes a first adder section that receives a first input and a second input and adds the inputs to generate a first sum. The circuit also includes a second adder section that receives the first and second inputs and adds the inputs to generate a second sum. An input processor of the circuit receives the first and second inputs, determines whether a relationship between the first and second inputs satisfies a set of conditions, and selects a high-power mode of the adder circuit or a low-power mode of the adder circuit using the determined relationship between the first and second inputs. The high-power mode is selected and the first and second inputs are routed to the second adder section when the relationship satisfies the set of conditions.
-
公开(公告)号:US20220083480A1
公开(公告)日:2022-03-17
申请号:US17410071
申请日:2021-08-24
Applicant: Google LLC
Inventor: Dong Hyuk Woo , Ravi Narayanaswami
IPC: G06F13/16 , G06N20/00 , G06N3/10 , G06F15/76 , G06F9/38 , G06N3/04 , G06N20/10 , G06F17/16 , G06N5/04 , G06N3/063 , G06N3/08
Abstract: A computer-implemented method includes receiving, by a computing device, input activations and determining, by a controller of the computing device, whether each of the input activations has either a zero value or a non-zero value. The method further includes storing, in a memory bank of the computing device, at least one of the input activations. Storing the at least one input activation includes generating an index comprising one or more memory address locations that have input activation values that are non-zero values. The method still further includes providing, by the controller and from the memory bank, at least one input activation onto a data bus that is accessible by one or more units of a computational array. The activations are provided, at least in part, from a memory address location associated with the index.
-
公开(公告)号:US11099772B2
公开(公告)日:2021-08-24
申请号:US16700385
申请日:2019-12-02
Applicant: Google LLC
Inventor: Olivier Temam , Harshit Khaitan , Ravi Narayanaswami , Dong Hyuk Woo
Abstract: Methods, systems, and apparatus, including an apparatus for transferring data using multiple buffers, including multiple memories and one or more processing units configured to determine buffer memory addresses for a sequence of data elements stored in a first data storage location that are being transferred to a second data storage location. For each group of one or more of the data elements in the sequence, a value of a buffer assignment element that can be switched between multiple values each corresponding to a different one of the memories is identified. A buffer memory address for the group of one or more data elements is determined based on the value of the buffer assignment element. The value of the buffer assignment element is switched prior to determining the buffer memory address for a subsequent group of one or more data elements of the sequence of data elements.
-
公开(公告)号:US20200342350A1
公开(公告)日:2020-10-29
申请号:US16397481
申请日:2019-04-29
Applicant: Google LLC
Inventor: Lawrence J. Madar, III , Temitayo Fadelu , Harshit Khaitan , Ravi Narayanaswami
IPC: G06N20/00
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for virtualizing external memory as local to a machine learning accelerator. One ambient computing system comprises: an ambient machine learning engine; a low-power CPU; and an SRAM that is shared among at least the ambient machine learning engine and the low-power CPU; wherein the ambient machine learning engine comprises virtual address logic to translate from virtual addresses generated by the ambient machine learning engine to physical addresses within the SRAM.
-
公开(公告)号:US20200012705A1
公开(公告)日:2020-01-09
申请号:US16571749
申请日:2019-09-16
Applicant: Google LLC
Inventor: Ravi Narayanaswami , Rahul Nagarajan , Dong Hyuk Woo , Christopher Daniel Leary
Abstract: Methods, systems, and apparatus, including a system for transforming sparse elements to a dense matrix. The system is configured to receive a request for an output matrix based on sparse elements including sparse elements associated with a first dense matrix and sparse elements associated with a second dense matrix; obtain the sparse elements associated with the first dense matrix fetched by a first group of sparse element access units; obtain the sparse elements associated with the second dense matrix fetched by a second group of sparse element access units; and transform the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix to generate the output dense matrix that includes the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix.
-
公开(公告)号:US10360163B2
公开(公告)日:2019-07-23
申请号:US15336066
申请日:2016-10-27
Applicant: Google LLC
Inventor: Dong Hyuk Woo , Ravi Narayanaswami
IPC: G06F13/16 , G06F17/16 , G06N99/00 , G06N5/04 , G06N20/00 , G06N3/063 , G06N3/08 , G06N3/10 , G06F15/76
Abstract: A computer-implemented method includes receiving, by a computing device, input activations and determining, by a controller of the computing device, whether each of the input activations has either a zero value or a non-zero value. The method further includes storing, in a memory bank of the computing device, at least one of the input activations. Storing the at least one input activation includes generating an index comprising one or more memory address locations that have input activation values that are non-zero values. The method still further includes providing, by the controller and from the memory bank, at least one input activation onto a data bus that is accessible by one or more units of a computational array. The activations are provided, at least in part, from a memory address location associated with the index.
-
10.
公开(公告)号:US20190156187A1
公开(公告)日:2019-05-23
申请号:US15819753
申请日:2017-11-21
Applicant: Google LLC
Inventor: Uday Kumar Dasari , Olivier Temam , Ravi Narayanaswami , Dong Hyuk Woo
Abstract: Apparatus and methods for processing neural network models are provided. The apparatus can comprise a plurality of identical artificial intelligence processing dies. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies can include at least one inter-die input block and at least one inter-die output block. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies is communicatively coupled to another artificial intelligence processing die among the plurality of identical artificial intelligence processing dies by way of one or more communication paths from the at least one inter-die output block of the artificial intelligence processing die to the at least one inter-die input block of the artificial intelligence processing die. Each artificial intelligence processing die among the plurality of identical artificial intelligence processing dies corresponds to at least one layer of a neural network.
-
-
-
-
-
-
-
-
-