Patent search ap:("SambaNova Systems Page Inc.") AND inv:"Matheen MUSADDIQ"

1.

发明公开
LOSSLESS TILING IN CONVOLUTION NETWORKS - TILING CONFIGURATION BETWEEN TWO SECTIONS 审中-公开

公开(公告)号：US20240168913A1

公开(公告)日：2024-05-23

申请号：US18518695

申请日：2023-11-24

Applicant: SambaNova Systems, Inc.

Inventor： Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH

IPC: G06F15/78 , G06F16/901 , G06F17/16

CPC classification number: G06F15/7885 , G06F15/7839 , G06F16/9024 , G06F17/16

Abstract: Disclosed is a method that includes sectioning a graph into a sequence of sections, the sequence of sections including at least a first section followed by a second section. The first section is configured to generate a first output in a first target tiling configuration in response to processing a first input in a first input tiling configuration. The graph is configured to reconfigure the first output in the first target tiling configuration to a second input in a second input tiling configuration. The second section is configured to generate a second output in a second target tiling configuration in response to processing the second input in the second input tiling configuration.

2.

发明申请
HANDLING DYNAMIC TENSOR LENGTHS IN A RECONFIGURABLE PROCESSOR THAT INCLUDES MULTIPLE MEMORY UNITS 有权

公开(公告)号：US20240427727A1

公开(公告)日：2024-12-26

申请号：US18213598

申请日：2023-06-23

Applicant: SambaNova Systems, Inc.

Inventor： Abhishek SRIVASTAVA , Matthew VILIM , Raghu PRABHAKAR , Sankar RACHURU , Zhekun ZHANG , Matheen MUSADDIQ , Apurv VIVEK , Sitanshu GUPTA

IPC: G06F15/78 , G06F8/41

Abstract: In some aspects, a program is executed on a coarse-grained reconfigurable (CGR) processor. The CGR determines that the program produces an output that includes a variable length tensor, determines a maximum size of the variable length tensor and sets, based on the maximum size, a maximum of a counter associated with the program. The counter is set to an initial value of zero. The CGR initiates execution of the program, causing the program to receive an input tensor. Based on determining that the program is operating on a first portion of the input tensor, the CGR performs an update to the counter, to create an updated counter, and communicates the updated counter to one or more consumers within the program. After determining that the program has completed operating on the input tensor, a final size of the output is communicated to one or more downstream consumers external to the program.

3.

发明申请
Lossless Tiling in Convolution Networks - Tiling Configuration for a Sequence of Sections of a Graph 有权

公开(公告)号：US20220309316A1

公开(公告)日：2022-09-29

申请号：US17364110

申请日：2021-06-30

Applicant: SambaNova Systems, Inc.

Inventor： Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH

IPC: G06N3/04

Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections including a first section and a second section. The compile time logic is to configure the first section with a first topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the first section, and configure the second section with a second topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the second section. The data processing system further includes runtime logic configured with the compile time logic to execute the first section to generate the inputs, intermediate outputs, and final outputs of the first section in the first topology of tiling configurations, and execute the second section to generate the inputs, intermediate outputs, and final outputs of the second section in the second topology of tiling configurations.

4.

发明公开
Skip Buffer Splitting 审中-公开

公开(公告)号：US20230385043A1

公开(公告)日：2023-11-30

申请号：US17944872

申请日：2022-09-14

Applicant: SambaNova Systems, Inc.

Inventor： Nathan SHEELEY , Weihang FAN , Matheen MUSADDIQ , Ram SIVARAMAKRISHNAN

IPC: G06F8/41 , G06F3/06

CPC classification number: G06F8/452 , G06F3/0656 , G06F3/0604 , G06F3/0673

Abstract: A compiler transforms a high-level program into configuration data for a coarse-grained reconfigurable (CGR) data processor with an array of CGR units. The compiler includes a method that identifies a skip buffer in a dataflow graph, determines limitations associated with the array, and searches for a lowest cost implementation topology and stage depth. At least three topologies are considered, including a cascaded buffer topology, a hybrid buffer topology, and a striped buffer topology. The lowest cost implementation topology and stage depth are based on the size of the buffered data (usually, the size of a tensor), the depth of the skip buffer, and the array's limitations. The hybrid buffer topology includes multiple sections of parallel memory units. The data travels between memory units in one section to adjacent memory units in a next section without intervening reorder buffers.

5.

发明申请
Lossless Tiling in Convolution Networks - Tiling Configuration Between Two Sections 有权

公开(公告)号：US20220309317A1

公开(公告)日：2022-09-29

申请号：US17364129

申请日：2021-06-30

Applicant: SambaNova Systems, Inc.

Inventor： Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH

IPC: G06N3/04

Abstract: Disclosed is a method that includes sectioning a graph into a sequence of sections, the sequence of sections including at least a first section followed by a second section. The first section is configured to generate a first output in a first target tiling configuration in response to processing a first input in a first input tiling configuration. The graph is configured to reconfigure the first output in the first target tiling configuration to a second input in a second input tiling configuration. The second section is configured to generate a second output in a second target tiling configuration in response to processing the second input in the second input tiling configuration.

6.

发明申请
LOSSLESS TILING IN CONVOLUTION NETWORKS - RESETTING OVERLAP FACTOR TO ZERO AT SECTION BOUNDARIES 有权

公开(公告)号：US20220309325A1

公开(公告)日：2022-09-29

申请号：US17713157

申请日：2022-04-04

Applicant: SambaNova Systems, Inc.

Inventor： Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH

IPC: G06N3/04

Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.

7.

发明申请
LOSSLESS TILING IN CONVOLUTION NETWORKS - GRAPH METADATA GENERATION 有权

公开(公告)号：US20220309324A1

公开(公告)日：2022-09-29

申请号：US17700452

申请日：2022-03-21

Applicant: SambaNova Systems, Inc.

Inventor： Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH

IPC: G06N3/04

Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.

8.

发明申请
Tensor Partitioning and Partition Access Order 有权

公开(公告)号：US20220309029A1

公开(公告)日：2022-09-29

申请号：US17476749

申请日：2021-09-16

Applicant: SambaNova Systems, Inc.

Inventor： Raghu PRABHAKAR , Nathan Francis SHEELEY , Matheen MUSADDIQ , Scott Layson BURSON , Sitanshu GUPTA , Sumti JAIRATH , Pramod NATARAJA , Ajit PUNJ

IPC: G06F15/78 , G06F17/16 , G06F15/80

Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

9.

发明申请
Lossless Tiling in Convolution Networks - Materialization of Tensors 有权

公开(公告)号：US20220309028A1

公开(公告)日：2022-09-29

申请号：US17384515

申请日：2021-07-23

Applicant: SambaNova Systems, Inc.

Inventor： Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH

IPC: G06F15/78 , G06F17/16 , G06F16/901

Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.

10.

发明申请
Coupling Operations on Dynamically-Sized Data Structures in Data Flow Architectures 有权

公开(公告)号：US20250004972A1

公开(公告)日：2025-01-02

申请号：US18884707

申请日：2024-09-13

Applicant: SambaNova Systems, Inc.

Inventor： Abhishek SRIVASTAVA , Matthew VILIM , Raghu PRABHAKAR , Sankar RACHURU , Zhekun ZHANG , Matheen MUSADDIQ , Apurv VIVEK , Sitanshu GUPTA , Ayesha Siddiqua

IPC: G06F13/40

Abstract: A data processing system for implementing operations that generate a dynamically-sized output comprises a reconfigurable processor and a compiler. The compiler generates configuration data for configuring the reconfigurable processor to implement first and second operations and first and second connections. The first operation generates an output, and the second operation receives the output of the first operation as an input. The size of the output is unknown when generating the configuration data, and the output comprises a number of elements that is smaller than or equal to a predetermined maximum number of elements. The first connection for the output and the second connection for the input are both suitable for a transmission of the predetermined maximum number of elements. The reconfigurable processor is configured with the configuration data such that the reconfigurable processor implements the first operation, the second operation, the first connection, and the second connection.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification