专利检索 ap:("SambaNova Systems, Inc.") AND inv:"Weiwei CHEN" 第 1 页

1.

发明公开
COMPILE TIME LOGIC FOR DETECTING AND RESOLVING MEMORY LAYOUT CONFLICTS 审中-公开

公开(公告)号：US20230376292A1

公开(公告)日：2023-11-23

申请号：US18136321

申请日：2023-04-18

申请人： SambaNova Systems, Inc.

发明人： David Alan KOEPLINGER , Weiwei CHEN , Kevin BROWN , Xiaoming GU

IPC分类号： G06F8/41 , G06F12/0842

CPC分类号： G06F8/443 , G06F12/0842 , G06F8/433

摘要： The technology disclosed relates to automatically assigning and optimizing the physical memory layouts of all intermediate dense tensor data in a program. The technology disclosed is an implementation of a compiler analysis and transformation pass which automatically determines required physical layouts in light of kernel operation and performance requirements. The proposed solution also inserts physical layout conversion operations where necessary in cases of unresolvable incompatibilities. The pass takes as input a program acyclic dataflow graph and a set of physical layout constraints for every known operation.

2.

发明公开
Merging Buffer Access Operations in a Coarse-grained Reconfigurable Computing System 审中-公开

公开(公告)号：US20230325312A1

公开(公告)日：2023-10-12

申请号：US17974910

申请日：2022-10-27

申请人： SambaNova Systems, Inc.

发明人： David Alan KOEPLINGER , Adam BORDELON , Weihang FAN , Kevin BROWN , Weiwei CHEN

IPC分类号： G06F12/0802

CPC分类号： G06F12/0802 , G06F2212/1041

摘要： A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.

3.

发明公开
FLOW CONTROL FOR RECONFIGURABLE PROCESSORS 审中-公开

公开(公告)号：US20230325163A1

公开(公告)日：2023-10-12

申请号：US18206829

申请日：2023-06-07

申请人： SambaNova Systems, Inc.

发明人： Weiwei CHEN , Raghu PRABHAKAR , David Alan KOEPLINGER , Sitanshu GUPTA , Ruddhi CHAPHEKAR , Ajit PUNJ , Sumti JAIRATH

IPC分类号： G06F8/41 , G06F15/78

CPC分类号： G06F8/452 , G06F15/7867 , G06F8/41 , G06F15/825

摘要： The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.

4.

发明公开
COMPILER-BASED INPUT SYNCHRONIZATION FOR PROCESSOR WITH VARIANT STAGE LATENCIES 审中-公开

公开(公告)号：US20230205501A1

公开(公告)日：2023-06-29

申请号：US18089157

申请日：2022-12-27

申请人： SambaNova Systems, Inc.

发明人： Weiwei CHEN , Raghu PRABHAKAR , David Alan KOEPLINGER

IPC分类号： G06F8/41

CPC分类号： G06F8/453 , G06F8/458 , G06F8/433 , G06F8/441

摘要： The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.

5.

发明申请
Compile Time Logic for Detecting Streaming Compatible and Broadcast Compatible Data Access Patterns 有权

公开(公告)号：US20220092247A1

公开(公告)日：2022-03-24

申请号：US17031679

申请日：2020-09-24

申请人： SambaNova Systems, Inc.

发明人： David Alan KOEPLINGER , Weiwei CHEN , Kevin James BROWN , Xiaoming GU

IPC分类号： G06F30/392 , G06F30/33 , G06F30/337 , G06F8/41

摘要： A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.

6.

发明申请
Anti-Congestion Flow Control for Reconfigurable Processors 有权

公开(公告)号：US20210373867A1

公开(公告)日：2021-12-02

申请号：US16890841

申请日：2020-06-02

申请人： SambaNova Systems, Inc.

发明人： Weiwei CHEN , Raghu PRABHAKAR , David Alan KOEPLINGER , Sitanshu GUPTA , Ruddhi Arun CHAPHEKAR , Ajit PUNJ , Sumti JAIRATH

IPC分类号： G06F8/41 , G06F15/78

摘要： A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

7.

发明公开
OPTIMIZING TENSOR TILING IN NEURAL NETWORKS BASED ON A TILING COST MODEL 审中-公开

公开(公告)号：US20230315410A1

公开(公告)日：2023-10-05

申请号：US18129714

申请日：2023-03-31

申请人： SambaNova Systems, Inc.

发明人： Bowen YANG , Zhuo CHEN , Chen LIU , Fei WANG , Ruobing WANG , Qinghua Li , Weiwei CHEN , Junjue WANG , Sumti JAIRATH

IPC分类号： G06F8/41

CPC分类号： G06F8/443

摘要： A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.

8.

发明申请
COMPILE TIME LOGIC FOR INSERTING A BUFFER BETWEEN A PRODUCER OPERATION UNIT AND A CONSUMER OPERATION UNIT IN A DATAFLOW GRAPH 有权

公开(公告)号：US20220147328A1

公开(公告)日：2022-05-12

申请号：US17582421

申请日：2022-01-24

申请人： SambaNova Systems, Inc.

发明人： Kevin James BROWN , David Alan KOEPLINGER , Weiwei CHEN , Xiaoming GU

IPC分类号： G06F8/41 , G06F15/78 , G06F16/90

摘要： A dataflow graph for an application has operation units that are configured to be producers and consumers of tensors. A write access pattern of a particular producer specifies an order in which the particular producer generates elements of a tensor, and a read access pattern of a corresponding consumer specifies an order in which the corresponding consumer processes the elements of the tensor. The technology disclosed detects conflicts between the producers and the corresponding consumers that have mismatches between the write access patterns and the read access patterns. A conflict occurs when the order in which the particular producer generates the elements of the tensor is different from the order in which the corresponding consumer processes the elements of the tensor. The technology disclosed resolves the conflicts by inserting buffers between the producers and the corresponding consumers.

9.

发明公开
Merging Skip-Buffers 审中-公开

公开(公告)号：US20230305823A1

公开(公告)日：2023-09-28

申请号：US18126610

申请日：2023-03-27

申请人： SambaNova Systems, Inc.

发明人： Fei WANG , David Alan KOEPLINGER , Kevin BROWN , Weiwei CHEN

IPC分类号： G06F8/41

CPC分类号： G06F8/45 , G06F8/4434

摘要： A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类