-
公开(公告)号:US20240211269A1
公开(公告)日:2024-06-27
申请号:US18597005
申请日:2024-03-06
Applicant: Google LLC
Inventor: Rahul Nagarajan , Suvinay Subramanian , Arpith Chacko Jacob
CPC classification number: G06F9/3887 , G06F9/30036
Abstract: Aspects of the disclosure are directed to a cross-lane processing unit (XPU) for performing data-dependent operations across multiple data processing lanes of a processor. Rather than implementing operation-specific circuits for each data-dependent operation, the XPU can be configured to perform different operations in response to input signals configuring individual operations performed by processing cells and crossbars arranged as a stacked network in the XPU. Each processing cell can receive and process data across multiple data processing lanes. Aspects of the disclosure include configuring the XPU to use a vector sort network to perform a duplicate count eliminating the need to configure the XPU separately for sorting and duplicate counting.
-
公开(公告)号:US20230153116A1
公开(公告)日:2023-05-18
申请号:US17981617
申请日:2022-11-07
Applicant: Google LLC
Inventor: Rahul Nagarajan , Suvinay Subramanian , Arpith Chacko Jacob , Christopher Leary , Thomas James Norrie , Thejasvi Magudilu Vijayaraj , Hema Hariharan
CPC classification number: G06F9/3895 , G06F9/3887 , G06F9/30036 , G06N3/02
Abstract: Aspects of the disclosure provide for an accelerator capable of accelerating data dependent, irregular, and/or memory-bound operations. An accelerator as described herein includes a programmable engine for efficiently executing computations on-chip that are dynamic, irregular, and/or memory-bound, in conjunction with a co-processor configured to accelerate operations that are predictable in computational load and behavior on the co-processor during design and fabrication.
-
公开(公告)号:US20230305970A1
公开(公告)日:2023-09-28
申请号:US17722782
申请日:2022-04-18
Applicant: Google LLC
Inventor: Rahul Nagarajan , Arpith Chacko Jacob , Suvinay Subramanian , Hema Hariharan
CPC classification number: G06F9/30134 , G06F9/35 , G06F9/3869 , G06F9/522
Abstract: Generally disclosed herein is a hardware/software interface for asynchronous data movement between an off-core memory and a core-local memory, referred to as “stream transfers”, and a stream ordering model. The stream transfers allow software to more efficiently express common data-movement patterns, specifically ones seen in sparse workloads. Direct stream instructions that belong to a stream are processed in-order. For indirect stream instructions, offset elements in an offset list are processed in order. A sync flag is updated to indicate monotonic incremental progress for the stream.
-
公开(公告)号:US20230153115A1
公开(公告)日:2023-05-18
申请号:US17972663
申请日:2022-10-25
Applicant: Google LLC
Inventor: Rahul Nagarajan , Suvinay Subramanian , Arpith Chacko Jacob
CPC classification number: G06F9/3887 , G06F9/30036
Abstract: Aspects of the disclosure are directed to a cross-lane processing unit (XPU) for performing data-dependent operations across multiple data processing lanes of a processor. Rather than implementing operation-specific circuits for each data-dependent operation, the XPU can be configured to perform different operations in response to input signals configuring individual operations performed by processing cells and crossbars arranged as a stacked network in the XPU. Each processing cell can receive and process data across multiple data processing lanes. Aspects of the disclosure include configuring the XPU to use a vector sort network to perform a duplicate count eliminating the need to configure the XPU separately for sorting and duplicate counting.
-
公开(公告)号:US20240211413A1
公开(公告)日:2024-06-27
申请号:US18596835
申请日:2024-03-06
Applicant: Google LLC
Inventor: Rahul Nagarajan , Arpith Chacko Jacob , Suvinay Subramanian , Hema Hariharan
CPC classification number: G06F13/161 , G06F9/35 , G06F9/3869 , G06F9/522
Abstract: Generally disclosed herein is a hardware/software interface for asynchronous data movement between an off-core memory and a core-local memory, referred to as “stream transfers”, and a stream ordering model. The stream transfers allow software to more efficiently express common data-movement patterns, specifically ones seen in sparse workloads. Direct stream instructions that belong to a stream are processed in-order. For indirect stream instructions, offset elements in an offset list are processed in order. A sync flag is updated to indicate monotonic incremental progress for the stream.
-
公开(公告)号:US11977499B2
公开(公告)日:2024-05-07
申请号:US17722782
申请日:2022-04-18
Applicant: Google LLC
Inventor: Rahul Nagarajan , Arpith Chacko Jacob , Suvinay Subramanian , Hema Hariharan
CPC classification number: G06F13/161 , G06F9/35 , G06F9/3869 , G06F9/522
Abstract: Generally disclosed herein is a hardware/software interface for asynchronous data movement between an off-core memory and a core-local memory, referred to as “stream transfers”, and a stream ordering model. The stream transfers allow software to more efficiently express common data-movement patterns, specifically ones seen in sparse workloads. Direct stream instructions that belong to a stream are processed in-order. For indirect stream instructions, offset elements in an offset list are processed in order. A sync flag is updated to indicate monotonic incremental progress for the stream.
-
公开(公告)号:US11966745B2
公开(公告)日:2024-04-23
申请号:US17972663
申请日:2022-10-25
Applicant: Google LLC
Inventor: Rahul Nagarajan , Suvinay Subramanian , Arpith Chacko Jacob
CPC classification number: G06F9/3887 , G06F9/30036
Abstract: Aspects of the disclosure are directed to a cross-lane processing unit (XPU) for performing data-dependent operations across multiple data processing lanes of a processor. Rather than implementing operation-specific circuits for each data-dependent operation, the XPU can be configured to perform different operations in response to input signals configuring individual operations performed by processing cells and crossbars arranged as a stacked network in the XPU. Each processing cell can receive and process data across multiple data processing lanes. Aspects of the disclosure include configuring the XPU to use a vector sort network to perform a duplicate count eliminating the need to configure the XPU separately for sorting and duplicate counting.
-
-
-
-
-
-