-
1.
公开(公告)号:US20240020170A1
公开(公告)日:2024-01-18
申请号:US18221678
申请日:2023-07-13
Applicant: SambaNova Systems, Inc.
Inventor: Yue FU , Kin Hing LEUNG , Arvind Krishna SUJEETH , Sumti JAIRATH , Andrew DENG , Chris RÉ , Raghu PRABHAKAR
CPC classification number: G06F9/5044 , G06F13/4063
Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for estimating a cost of implementing an operation unit graph. The operation unit graph may include first and second logical units that perform first and second data operations and have first and second ports, respectively, coupled by a logical edge, on a reconfigurable processor. The method includes receiving the operation unit graph, determining first and second upper bandwidth limits of the first and second ports, respectively, determining a logical edge bandwidth of the logical edge based on the first and second upper bandwidth limits, determining a timing group for the logical edge, and providing the logical edge bandwidth and the timing group as a cost estimation of implementing the operation unit graph on the reconfigurable processor.
-
公开(公告)号:US20230289310A1
公开(公告)日:2023-09-14
申请号:US18199361
申请日:2023-05-18
Applicant: SambaNova Systems, Inc.
Inventor: Gregory Frederick GROHOSKI , Sumti JAIRATH , Mark LUTTRELL , Raghu PRABHAKAR , Ram SIVARAMAKRISHNAN , Manish K. SHAH
CPC classification number: G06F13/4027 , G06F9/45533 , G06F12/10 , G06F13/1668 , G06F15/7839 , G06F15/7882 , G06F2212/657
Abstract: A reconfigurable data processor comprises an array of configurable units and a bus system. The bus system is connected to the array of configurable units. The bus system includes a top level network and an array level network. The top level network is connected to an external data interface for communication with memory outside of the array of configurable units. The array level network is connected to configurable units in the array of configurable units.
-
3.
公开(公告)号:US20220309325A1
公开(公告)日:2022-09-29
申请号:US17713157
申请日:2022-04-04
Applicant: SambaNova Systems, Inc.
Inventor: Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH
IPC: G06N3/04
Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.
-
公开(公告)号:US20220309324A1
公开(公告)日:2022-09-29
申请号:US17700452
申请日:2022-03-21
Applicant: SambaNova Systems, Inc.
Inventor: Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH
IPC: G06N3/04
Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.
-
公开(公告)号:US20220309029A1
公开(公告)日:2022-09-29
申请号:US17476749
申请日:2021-09-16
Applicant: SambaNova Systems, Inc.
Inventor: Raghu PRABHAKAR , Nathan Francis SHEELEY , Matheen MUSADDIQ , Scott Layson BURSON , Sitanshu GUPTA , Sumti JAIRATH , Pramod NATARAJA , Ajit PUNJ
Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
-
公开(公告)号:US20220309028A1
公开(公告)日:2022-09-29
申请号:US17384515
申请日:2021-07-23
Applicant: SambaNova Systems, Inc.
Inventor: Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH
IPC: G06F15/78 , G06F17/16 , G06F16/901
Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.
-
公开(公告)号:US20220058034A1
公开(公告)日:2022-02-24
申请号:US16996666
申请日:2020-08-18
Applicant: SambaNova Systems, Inc.
Inventor: Gregory Frederick GROHOSKI , Manish K. SHAH , Raghu PRABHAKAR , Mark LUTTRELL , Ravinder KUMAR , Kin Hing LEUNG , Ranen CHATTERJEE , Sumti JAIRATH , David Alan KOEPLINGER , Ram SIVARAMAKRISHNAN , Matthew Thomas GRIMM
Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
-
公开(公告)号:US20240069880A1
公开(公告)日:2024-02-29
申请号:US18387906
申请日:2023-11-08
Applicant: SambaNova Systems, Inc.
Inventor: Blaine RISTER , Qingjian LI , Bowen YANG , Junjue WANG , Chen LIU , Zhuo CHEN , Arvind SUJEETH , Sumti JAIRATH
IPC: G06F8/41
CPC classification number: G06F8/433
Abstract: In a method a computer-implemented efficiency analyzer selects operators from an intermediate representation of a dataflow program. The operators are included in a mapping of the operators to hardware of a computing system to execute the dataflow program. Based on the mapping and a description of the hardware, the efficiency analyzer computes an execution metric associated with executing the operators on the hardware. Based on the execution metric and hardware description, the efficiency analyzer determines an inefficiency metric, and based on the inefficiency metric, the efficiency analyzer determines an inefficiency associated with the dataflow program. The computing system to execute the dataflow program can comprise a coarse grain computing system and the hardware can include a reconfigurable processor of the computing system. A computer program product and a computing system to a the dataflow program can implement the method.
-
公开(公告)号:US20230315802A1
公开(公告)日:2023-10-05
申请号:US18128076
申请日:2023-03-29
Applicant: SambaNova Systems, Inc.
Inventor: Junjue WANG , Blaine Burton RISTER , Zhichao MA , Zhuo CHEN , Andrew DENG , Sumti JAIRATH , Arvind Krishna SUJEETH
IPC: G06F17/11
CPC classification number: G06F17/11
Abstract: A method comprises a compiler generating a MI (mixed integer) model to determine mapping decisions to map a dataflow application to hardware of a computing system to execute the application. The MI model comprises MI equations to solve by an MI solver. The MI equations include equations of an objective function corresponding to an optimization objective. The MI equations can comprise decision variables and equations and constraint variables and equations. The compiler outputs the MI model to the MI solver and invokes the MI solver to compute an MI solution comprising solutions to equations among the equations included in the MI model. The compiler receives the MI solution and generates a globally optimized mapping decision based on the MI solution. The MI solver can comprise a commercial program to solve MI linear equations. A computer program product and a computing system can implement the method.
-
10.
公开(公告)号:US20220309316A1
公开(公告)日:2022-09-29
申请号:US17364110
申请日:2021-06-30
Applicant: SambaNova Systems, Inc.
Inventor: Tejas Nagendra Babu NAMA , Ruddhi CHAPHEKAR , Ram SIVARAMAKRISHNAN , Raghu PRABHAKAR , Sumti JAIRATH , Junjue WANG , Kaizhao LIANG , Adi FUCHS , Matheen MUSADDIQ , Arvind Krishna SUJEETH
IPC: G06N3/04
Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections including a first section and a second section. The compile time logic is to configure the first section with a first topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the first section, and configure the second section with a second topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the second section. The data processing system further includes runtime logic configured with the compile time logic to execute the first section to generate the inputs, intermediate outputs, and final outputs of the first section in the first topology of tiling configurations, and execute the second section to generate the inputs, intermediate outputs, and final outputs of the second section in the second topology of tiling configurations.
-
-
-
-
-
-
-
-
-