DATAFLOW FUNCTION OFFLOAD TO RECONFIGURABLE PROCESSORS

    公开(公告)号:WO2022133047A1

    公开(公告)日:2022-06-23

    申请号:PCT/US2021/063733

    申请日:2021-12-16

    Abstract: Roughly described, the invention involves a system including a plurality of functional units that execute different segments of a dataflow, and share intermediate results via a peer-to-peer messaging protocol. The functional units are reconfigurable, with different units being reconfigurable at different levels of granularity. The peer-to-peer messaging protocol includes control tokens or other mechanisms by which the consumer of the intermediate results learns that data has been transferred, and in response thereto triggers its next dataflow segment. A host or configuration controller configures the data units with their respective dataflow segments, but once execution of the configured dataflow begins, no host need be involved in orchestrating data synchronization, the transfer of intermediate results, or the triggering of processing after the data are received. Control overhead is therefore minimized.

    CONFIGURATION UNLOAD OF A RECONFIGURABLE DATA PROCESSOR

    公开(公告)号:WO2020106769A1

    公开(公告)日:2020-05-28

    申请号:PCT/US2019/062289

    申请日:2019-11-19

    Abstract: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. Configurable units in the plurality of configurable units each include logic to execute a unit configuration load process, including receiving via the bus system, sub-files of a unit file particular to the configurable unit, and loading the received sub-files into the configuration store of the configurable unit. A configuration load controller connected to the bus system, including logic to execute an array configuration load process, including distributing a configuration file comprising unit files for a plurality of the configurable units in the array.

    ANTI-CONGESTION FLOW CONTROL FOR RECONFIGURABLE PROCESSORS

    公开(公告)号:WO2021247614A1

    公开(公告)日:2021-12-09

    申请号:PCT/US2021/035305

    申请日:2021-06-01

    Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

    PERFORMANCE ESTIMATION-BASED RESOURCE ALLOCATION FOR RECONFIGURABLE ARCHITECTURES

    公开(公告)号:WO2021055233A1

    公开(公告)日:2021-03-25

    申请号:PCT/US2020/050218

    申请日:2020-09-10

    Abstract: The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.

    INSTRUMENTATION PROFILING FOR RECONFIGURABLE PROCESSORS

    公开(公告)号:WO2022173821A1

    公开(公告)日:2022-08-18

    申请号:PCT/US2022/015807

    申请日:2022-02-09

    Abstract: A data processing system comprises compile time logic, runtime logic, a control bus, and instrumentation units operatively coupled to processing units of an array. The compile time logic is configured to generate configuration files for a dataflow graph. The runtime logic is configured to execute the configuration files on the array, and to trigger start and stop events, as defined by the configuration files, in response to implementation of compute and memory operations of the dataflow graph on the array. A control bus is configured to form event routes in the array. The instrumentation units have inputs and outputs connected to the control bus and to the processing units. The instrumentation units are configured to consume the start events on the inputs and start counting clock cycles, consume the stop events on the inputs and stop counting the clock cycles, and report the counted clock cycles on the outputs.

    COMPILER FLOW LOGIC FOR RECONFIGURABLE ARCHITECTURES

    公开(公告)号:WO2021026489A1

    公开(公告)日:2021-02-11

    申请号:PCT/US2020/045478

    申请日:2020-08-07

    Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units. It then places the physical memory units and the physical compute units onto positions in the array of configurable units and routes data and control networks between the placed positions.

Patent Agency Ranking