Patent search ap:("Google LLC") AND inv:"Rahul Nagarajan" Page 2

11.

发明申请
Multi-Directional Sharing And Multiplexing For High Bandwidth Memory 有权

公开(公告)号：US20250139027A1

公开(公告)日：2025-05-01

申请号：US18923025

申请日：2024-10-22

Applicant: Google LLC

Inventor： Horia Alexandru Toma , Rahul Nagarajan , Yujeong Shim , Rammohan Padmanabhan

IPC: G06F13/16 , G06F13/40

Abstract: Generally disclosed herein are electronic circuits with high bandwidth interfaces (HBI) for multi-directional die-to-die communications. The HBIs are designed to allow for sharing of data between all sides of the memory chiplets. By using all sides of the memory chiplets and multiplexing the data between the multiple connected chiplets, the total bandwidth of the memory available to the connected chiplets can increase. The sharing and multiplexing of the data can also be dynamically configured to accommodate various options for the allocation of performance levels and the associated cost.

12.

发明授权
Accelerated embedding layer computations 有权

公开(公告)号：US12282853B2

公开(公告)日：2025-04-22

申请号：US18582294

申请日：2024-02-20

Applicant: Google LLC

Inventor： Rahul Nagarajan , Lifeng Nai , George Kurian , Hema Hariharan

IPC: G06N3/08 , G06F1/03 , G06N3/063 , G06N20/10

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing neural network computations using a system configured to implement a neural network on a hardware circuit. The system includes a host that receives a batch of inputs to a neural network layer. Each of the inputs is stored in a memory location identified by an address. The system identifies one or more duplicate addresses in a listing of addresses for one or more inputs. For each duplicate address: the system generates a unique identifier that identifies the duplicate address in the listing of addresses. The system (i) obtains first inputs from memory locations identified by addresses corresponding to the unique identifiers and (ii) generates an output of the layer from the obtained first inputs.

13.

发明申请
Custom Scratchpad Memory For Partial Dot Product Reductions 有权

公开(公告)号：US20250013432A1

公开(公告)日：2025-01-09

申请号：US18218448

申请日：2023-07-05

Applicant: Google LLC

Inventor： Vinayak Anand Gokhale , Matthew Leever Hedlund , Rahul Nagarajan , Naveen Muralimanohar , Shriram Nagarajan

IPC: G06F7/544 , G06F7/50 , G06F7/533

Abstract: Aspects of the disclosed technology include techniques and mechanisms for using a custom scratchpad memory for partial dot product reductions. The custom scratchpad memory may be a special purpose memory that is dedicated to receiving and storing partial dot products determined by matrix multiplier units. Each partial dot product may correspond to tiles of a resultant matrix, where the resultant matrix is the product of matrix multiplication that can use a first matrix representing a user query as a left-hand side operand and a second matrix representing a trained model containing data that may be used to respond to the user query as a right-hand side operand. The custom scratchpad memory may append the tiles determined by the matrix multiplication, where the appended tiles may create the resultant matrix. Custom scratchpad memory may write the resultant matrix to general purpose memory, where it may be used to respond to the user query.

14.

发明公开
Programmable Accelerator for Data-Dependent, Irregular Operations 审中-公开

公开(公告)号：US20230153116A1

公开(公告)日：2023-05-18

申请号：US17981617

申请日：2022-11-07

Applicant: Google LLC

Inventor： Rahul Nagarajan , Suvinay Subramanian , Arpith Chacko Jacob , Christopher Leary , Thomas James Norrie , Thejasvi Magudilu Vijayaraj , Hema Hariharan

IPC: G06F9/38 , G06F9/30 , G06N3/02

CPC classification number: G06F9/3895 , G06F9/3887 , G06F9/30036 , G06N3/02

Abstract: Aspects of the disclosure provide for an accelerator capable of accelerating data dependent, irregular, and/or memory-bound operations. An accelerator as described herein includes a programmable engine for efficiently executing computations on-chip that are dynamic, irregular, and/or memory-bound, in conjunction with a co-processor configured to accelerate operations that are predictable in computational load and behavior on the co-processor during design and fabrication.

15.

发明授权
Matrix processing apparatus 有权

公开(公告)号：US11366877B2

公开(公告)日：2022-06-21

申请号：US16928242

申请日：2020-07-14

Applicant: Google LLC

Inventor： Ravi Narayanaswami , Rahul Nagarajan , Dong Hyuk Woo , Christopher Daniel Leary

IPC: G06F17/16 , G06F17/14

Abstract: Methods, systems, and apparatus, including a system for transforming sparse elements to a dense matrix. The system is configured to receive a request for an output matrix based on sparse elements including sparse elements associated with a first dense matrix and sparse elements associated with a second dense matrix; obtain the sparse elements associated with the first dense matrix fetched by a first group of sparse element access units; obtain the sparse elements associated with the second dense matrix fetched by a second group of sparse element access units; and transform the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix to generate the output dense matrix that includes the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix.

16.

发明申请
LOAD BALANCING FOR MEMORY CHANNEL CONTROLLERS 有权

公开(公告)号：US20220121918A1

公开(公告)日：2022-04-21

申请号：US17563509

申请日：2021-12-28

Applicant: Google LLC

Inventor： Rahul Nagarajan , Hema Hariharan

IPC: G06N3/063 , G06N3/08 , G06N3/04 , G06F3/06

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing neural network computations using a system configured to implement a neural network on a hardware circuit. The system includes a process ID unit that receives requests to obtain data from a memory that includes memory locations that are each identified by an address. For each request, the process ID unit selects a channel controller to receive the request, provides the request to be processed by the selected channel controller, and obtains the data from memory in response to processing the request using the selected channel controller. The channel controller is one of multiple channel controllers that are configured to access any memory location of the memory. The system performs the neural network computations using the data obtained from memory and resources allocated from a shared memory of the hardware circuit.

17.

发明申请
LOAD BALANCING FOR MEMORY CHANNEL CONTROLLERS 有权

公开(公告)号：US20210303978A1

公开(公告)日：2021-09-30

申请号：US16865539

申请日：2020-05-04

Applicant: Google LLC

Inventor： Rahul Nagarajan , Hema Hariharan

IPC: G06N3/063 , G06N3/04 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing neural network computations using a system configured to implement a neural network on a hardware circuit. The system includes a process ID unit that receives requests to obtain data from a memory that includes memory locations that are each identified by an address. For each request, the process ID unit selects a channel controller to receive the request, provides the request to be processed by the selected channel controller, and obtains the data from memory in response to processing the request using the selected channel controller. The channel controller is one of multiple channel controllers that are configured to access any memory location of the memory. The system performs the neural network computations using the data obtained from memory and resources allocated from a shared memory of the hardware circuit.

18.

发明公开
Streaming Transfers and Ordering Model 审中-公开

公开(公告)号：US20230305970A1

公开(公告)日：2023-09-28

申请号：US17722782

申请日：2022-04-18

Applicant: Google LLC

Inventor： Rahul Nagarajan , Arpith Chacko Jacob , Suvinay Subramanian , Hema Hariharan

IPC: G06F9/30 , G06F9/35 , G06F9/38 , G06F9/52

CPC classification number: G06F9/30134 , G06F9/35 , G06F9/3869 , G06F9/522

Abstract: Generally disclosed herein is a hardware/software interface for asynchronous data movement between an off-core memory and a core-local memory, referred to as “stream transfers”, and a stream ordering model. The stream transfers allow software to more efficiently express common data-movement patterns, specifically ones seen in sparse workloads. Direct stream instructions that belong to a stream are processed in-order. For indirect stream instructions, offset elements in an offset list are processed in order. A sync flag is updated to indicate monotonic incremental progress for the stream.

19.

发明公开
Cooperative Instruction Prefetch on Multicore System 审中-公开

公开(公告)号：US20230161592A1

公开(公告)日：2023-05-25

申请号：US17972681

申请日：2022-10-25

Applicant: Google LLC

Inventor： Rahul Nagarajan , Christopher Leary , Thejasvi Magudilu Vijayaraj , Thomas James Norrie

IPC: G06F9/38

CPC classification number: G06F9/3802 , G06F9/3887

Abstract: Aspects of the disclosure are directed to methods, systems, and apparatuses using an instruction prefetch pipeline architecture that provides good performance without the complexity of a full cache coherent solution deployed in conventional CPUs. The architecture can include components which can be used to construct an instruction prefetch pipeline, including instruction memory (TiMem), instruction buffer (iBuf), a prefetch unit, and an instruction router.

20.

发明公开
Sparse SIMD Cross-lane Processing Unit 审中-公开

公开(公告)号：US20230153115A1

公开(公告)日：2023-05-18

申请号：US17972663

申请日：2022-10-25

Applicant: Google LLC

Inventor： Rahul Nagarajan , Suvinay Subramanian , Arpith Chacko Jacob

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3887 , G06F9/30036

Abstract: Aspects of the disclosure are directed to a cross-lane processing unit (XPU) for performing data-dependent operations across multiple data processing lanes of a processor. Rather than implementing operation-specific circuits for each data-dependent operation, the XPU can be configured to perform different operations in response to input signals configuring individual operations performed by processing cells and crossbars arranged as a stacked network in the XPU. Each processing cell can receive and process data across multiple data processing lanes. Aspects of the disclosure include configuring the XPU to use a vector sort network to perform a duplicate count eliminating the need to configure the XPU separately for sorting and duplicate counting.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification