-
公开(公告)号:US12132802B2
公开(公告)日:2024-10-29
申请号:US17553387
申请日:2021-12-16
Applicant: Google LLC
Inventor: Weihuang Wang , Srinivas Vaduvatha , Xiaoming Wang , Gurushankar Rajamani , Abhishek Agarwal , Jiazhen Zheng , Prashant Chandra
IPC: H04L67/568 , G06F16/2455 , H04L49/00 , H04L69/326
CPC classification number: H04L67/568 , G06F16/24552 , H04L49/3063 , H04L69/326
Abstract: An application specific integrated circuit (ASIC) is provided for reliable transport of packets. The network interface card may include a reliable transport accelerator (RTA). The RTA may include a cache lookup database. The RTA may be configured to determine, from a received data packet, a connection identifier and query the cache lookup database for a cache entry corresponding to a connection context having the connection identifier. In response to the query, the RTA may receive a cache hit or a cache miss.
-
公开(公告)号:US20210263739A1
公开(公告)日:2021-08-26
申请号:US17007569
申请日:2020-08-31
Applicant: Google LLC
Inventor: Thomas Norrie , Gurushankar Rajamani , Andrew Everett Phelps , Matthew Leever Hedlund , Norman Paul Jouppi
Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.
-
公开(公告)号:US11928580B2
公开(公告)日:2024-03-12
申请号:US17713122
申请日:2022-04-04
Applicant: Google LLC
Inventor: Gurushankar Rajamani , Alice Kuo
CPC classification number: G06N3/063 , G06F3/0611 , G06F3/0659 , G06F3/0673 , G06N3/04
Abstract: Methods, systems, and apparatus, including computer-readable media, are described for interleaving memory requests to accelerate memory accesses at a hardware circuit configured to implement a neural network model. A system generates multiple requests that are processed against a memory of the system. Each request is used to retrieve data from the memory. For each request, the system generates multiple sub-requests based on a respective size of the data to be retrieved using the request. The system generates a sequence of interleaved sub-requests that includes respective sub-requests of a first request interleaved among respective sub-requests of a second request. Based on the sequence of interleaved sub-requests, a module of the system receives respective portions of data accessed from different address locations of the memory. The system processes each of the respective portions of data to generate a neural network inference using the neural network model implemented at the hardware circuit.
-
公开(公告)号:US20230062889A1
公开(公告)日:2023-03-02
申请号:US17553387
申请日:2021-12-16
Applicant: Google LLC
Inventor: Weihuang Wang , Srinivas Vaduvatha , Xiaoming Wang , Gurushankar Rajamani , Abhishek Agarwal , Jiazhen Zheng , Prashant Chandra
IPC: H04L67/568 , H04L49/00 , H04L69/326 , G06F16/2455
Abstract: An application specific integrated circuit (ASIC) is provided for reliable transport of packets. The network interface card may include a reliable transport accelerator (RTA). The RTA may include a cache lookup database. The RTA may be configured to determine, from a received data packet, a connection identifier and query the cache lookup database for a cache entry corresponding to a connection context having the connection identifier. In response to the query, the RTA may receive a cache hit or a cache miss.
-
公开(公告)号:US11295206B2
公开(公告)日:2022-04-05
申请号:US16874894
申请日:2020-05-15
Applicant: Google LLC
Inventor: Gurushankar Rajamani , Alice Kuo
Abstract: Methods, systems, and apparatus, including computer-readable media, are described for interleaving memory requests to accelerate memory accesses at a hardware circuit configured to implement a neural network model. A system generates multiple requests that are processed against a memory of the system. Each request is used to retrieve data from the memory. For each request, the system generates multiple sub-requests based on a respective size of the data to be retrieved using the request. The system generates a sequence of interleaved sub-requests that includes respective sub-requests of a first request interleaved among respective sub-requests of a second request. Based on the sequence of interleaved sub-requests, a module of the system receives respective portions of data accessed from different address locations of the memory. The system processes each of the respective portions of data to generate a neural network inference using the neural network model implemented at the hardware circuit.
-
公开(公告)号:US20210223985A1
公开(公告)日:2021-07-22
申请号:US16930172
申请日:2020-07-15
Applicant: Google LLC
Inventor: Amin Farmahini , Benjamin Steel Gelb , Gurushankar Rajamani , Sukalpa Biswas
IPC: G06F3/06
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.
-
公开(公告)号:US11513724B2
公开(公告)日:2022-11-29
申请号:US17348558
申请日:2021-06-15
Applicant: Google LLC
Inventor: Amin Farmahini , Benjamin Steel Gelb , Gurushankar Rajamani , Sukalpa Biswas
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.
-
公开(公告)号:US20220156071A1
公开(公告)日:2022-05-19
申请号:US17530869
申请日:2021-11-19
Applicant: Google LLC
Inventor: Thomas Norrie , Gurushankar Rajamani , Andrew Everett Phelps , Matthew Leever Hedlund , Norman Paul Jouppi
Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.
-
公开(公告)号:US11137936B2
公开(公告)日:2021-10-05
申请号:US16930172
申请日:2020-07-15
Applicant: Google LLC
Inventor: Amin Farmahini , Benjamin Steel Gelb , Gurushankar Rajamani , Sukalpa Biswas
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.
-
公开(公告)号:US11748028B2
公开(公告)日:2023-09-05
申请号:US17993802
申请日:2022-11-23
Applicant: Google LLC
Inventor: Amin Farmahini , Benjamin Steel Gelb , Gurushankar Rajamani , Sukalpa Biswas
CPC classification number: G06F3/0655 , G06F3/0605 , G06F3/0679
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.
-
-
-
-
-
-
-
-
-