-
公开(公告)号:WO2022133047A1
公开(公告)日:2022-06-23
申请号:PCT/US2021/063733
申请日:2021-12-16
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: RAUMANN, Martin Russell , ZHENG, Qi , SHAH, Bandish B. , KUMAR, Ravinder , LEUNG, Kin Hing , JAIRATH, Sumti , GROHOSKI, Gregory Frederick
Abstract: Roughly described, the invention involves a system including a plurality of functional units that execute different segments of a dataflow, and share intermediate results via a peer-to-peer messaging protocol. The functional units are reconfigurable, with different units being reconfigurable at different levels of granularity. The peer-to-peer messaging protocol includes control tokens or other mechanisms by which the consumer of the intermediate results learns that data has been transferred, and in response thereto triggers its next dataflow segment. A host or configuration controller configures the data units with their respective dataflow segments, but once execution of the configured dataflow begins, no host need be involved in orchestrating data synchronization, the transfer of intermediate results, or the triggering of processing after the data are received. Control overhead is therefore minimized.
-
公开(公告)号:WO2022040230A1
公开(公告)日:2022-02-24
申请号:PCT/US2021/046368
申请日:2021-08-17
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: GROHOSKI, Gregory Frederick , SHAH, Manish K. , PRABHAKAR, Raghu , LUTTRELL, Mark , KUMAR, Ravinder , LEUNG, Kin Hing , CHATTERJEE, Ranen , JAIRATH, Sumti , KOEPLINGER, David Alan , SIVARAMAKRISHNAN, Ram , GRIMM, Matthew Thomas
Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
-
公开(公告)号:WO2022212062A1
公开(公告)日:2022-10-06
申请号:PCT/US2022/020641
申请日:2022-03-16
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: PRABHAKAR, Raghu , SHEELEY, Nathan Francis , MENON, Amitabh , GUPTA, Sitanshu , JAIRATH, Sumti , MUSADDIQ, Matheen
IPC: G06F15/78 , G06N3/063 , G06F15/7817 , G06F15/7839 , G06F15/7871 , G06F15/7885 , G06F17/16 , G06F2015/763 , G06N3/04
Abstract: An integrated circuit includes a plurality of configurable units, each configurable unit having two or more corresponding sections. The plurality of configurable units is arranged in a serial arrangement to form a chain of sections of the configurable units. A data bus is connected to the plurality of configurable units which communicates data at a clock rate. The chain of sections is to receive and write a series of tensors at the clock rate at a first end section of the chain of sections, and sequentially propagate the series of tensors through individual sections within the chain of sections at the clock rate. The chain of sections is to output the series of tensors at a second end section of the chain of sections. The chain of sections is to also output the series of tensors at an intermediate section of the chain of sections.
-
公开(公告)号:WO2021007131A1
公开(公告)日:2021-01-14
申请号:PCT/US2020/040832
申请日:2020-07-03
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: PRABHAKAR, Raghu , SHAH, Manish K. , NATARAJA, Pramod , JACKSON, David Brian , LEUNG, Kin Hing , SIVARAMAKRISHNAN, Ram , JAIRATH, Sumti , GROHOSKI, Gregory Frederick
Abstract: A reconfigurable data processor comprises an array of configurable units configurable to allocate a plurality of sets of configurable units in the array to implement respective execution fragments of the data processing operation. Quiesce logic is coupled to configurable units in the array, configurable to respond to a quiesce control signal to quiesce the sets of configurable units in the array on quiesce boundaries of the respective execution fragments, and to forward quiesce ready signals for the respective execution fragments when the corresponding sets of processing units are ready. An array quiesce controller distributes the quiesce control signal to configurable units in the array, and receives quiesce ready signals for the respective execution fragments from the quiesce logic.
-
公开(公告)号:WO2020106769A1
公开(公告)日:2020-05-28
申请号:PCT/US2019/062289
申请日:2019-11-19
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: SHAH, Manish K. , SIVARAMAKRISHNAN, Ram , LUTTRELL, Mark , JACKSON, David Brian , PRABHAKAR, Raghu , JAIRATH, Sumti , GROHOSKI, Gregory Frederick , NATARAJA, Pramod
IPC: G06F15/78
Abstract: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. Configurable units in the plurality of configurable units each include logic to execute a unit configuration load process, including receiving via the bus system, sub-files of a unit file particular to the configurable unit, and loading the received sub-files into the configuration store of the configurable unit. A configuration load controller connected to the bus system, including logic to execute an array configuration load process, including distributing a configuration file comprising unit files for a plurality of the configurable units in the array.
-
公开(公告)号:WO2021247614A1
公开(公告)日:2021-12-09
申请号:PCT/US2021/035305
申请日:2021-06-01
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: CHEN, Weiwei , PRABHAKAR, Raghu , KOEPLINGER, David, Alan , GUPTA, Sitanshu , CHAPHEKAR, Ruddhi Arun , PUNJ, Ajit , JAIRATH, Sumti
IPC: G06F15/78 , G06F15/82 , G06F15/7867 , G06F15/825 , G06F8/452
Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
-
公开(公告)号:WO2021055233A1
公开(公告)日:2021-03-25
申请号:PCT/US2020/050218
申请日:2020-09-10
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: CHEN, Zhuo , JAIRATH, Sumti
Abstract: The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.
-
公开(公告)号:WO2022212107A1
公开(公告)日:2022-10-06
申请号:PCT/US2022/021221
申请日:2022-03-21
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: NAMA, Tejas Nagendra Babu , CHAPHEKAR, Ruddhi , SIVARAMAKRISHNAN, Ram , PRABHAKAR, Raghu , JAIRATH, Sumti , WANG, Junjue , LIANG, Kaizhao , FUCHS, Adi , MUSADDIQ, Matheen , SUJEETH, Arvind Krishna
Abstract: Disclosed is a data processing system that includes compile time logic configured to section a graph into a sequence of sections, and configure each section of the sequence of sections such that an input layer of a section processes an input, one or more intermediate layers of the corresponding section processes corresponding one or more intermediate outputs, and a final layer of the corresponding section generates a final output. The final output has a non-overlapping final tiling configuration, the one or more intermediate outputs have corresponding one or more overlapping intermediate tiling configurations, and the input has an overlapping input tiling configuration. The compile time logic is further to determine the various tiling configurations by starting from the final layer and reverse traversing through the one or more intermediate layers, and ending with the input layer.
-
公开(公告)号:WO2022173821A1
公开(公告)日:2022-08-18
申请号:PCT/US2022/015807
申请日:2022-02-09
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: PRABHAKAR, Raghu , GRIMM, Matthew Thomas , JAIRATH, Sumti , LEUNG, Kin Hing , GUPTA, Sitanshu , LIN, Yuan , BOASSO, Luca
Abstract: A data processing system comprises compile time logic, runtime logic, a control bus, and instrumentation units operatively coupled to processing units of an array. The compile time logic is configured to generate configuration files for a dataflow graph. The runtime logic is configured to execute the configuration files on the array, and to trigger start and stop events, as defined by the configuration files, in response to implementation of compute and memory operations of the dataflow graph on the array. A control bus is configured to form event routes in the array. The instrumentation units have inputs and outputs connected to the control bus and to the processing units. The instrumentation units are configured to consume the start events on the inputs and start counting clock cycles, consume the stop events on the inputs and stop counting the clock cycles, and report the counted clock cycles on the outputs.
-
公开(公告)号:WO2021026489A1
公开(公告)日:2021-02-11
申请号:PCT/US2020/045478
申请日:2020-08-07
Applicant: SAMBANOVA SYSTEMS, INC.
Inventor: KOEPLINGER, David Alan , PRABHAKAR, Raghu , JAIRATH, Sumti
Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units. It then places the physical memory units and the physical compute units onto positions in the array of configurable units and routes data and control networks between the placed positions.
-
-
-
-
-
-
-
-
-