Data input/output operations during loop execution in a reconfigurable compute fabric

    公开(公告)号:US11709796B2

    公开(公告)日:2023-07-25

    申请号:US17402840

    申请日:2021-08-16

    CPC classification number: G06F15/825 G06F9/30065 G06F15/7867

    Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.

    TILE-BASED RESULT BUFFERING IN MEMORY-COMPUTE SYSTEMS

    公开(公告)号:US20230067771A1

    公开(公告)日:2023-03-02

    申请号:US17407502

    申请日:2021-08-20

    Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.

    LOOP EXECUTION IN A RECONFIGURABLE COMPUTE FABRIC

    公开(公告)号:US20220206804A1

    公开(公告)日:2022-06-30

    申请号:US17405371

    申请日:2021-08-18

    Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.

    EFFICIENT COMPLEX MULTIPLY AND ACCUMULATE

    公开(公告)号:US20220413804A1

    公开(公告)日:2022-12-29

    申请号:US17360407

    申请日:2021-06-28

    Abstract: Two commands each perform a partial complex multiply and accumulate. By using these two commands together, a full complex multiply and accumulate operation is performed. As compared to traditional implementations, this reduces the number of commands used from eight (four multiplies, a subtraction and three adds) to two. In some example embodiments, a single-instruction/multiple-data (SIMD) architecture is used to enable each command to perform multiple partial complex multiply and accumulate operations simultaneously, further increasing efficiency. One application of a complex multiply and accumulate is in generating images from pulse data of a radar or lidar. For example, an image may be generated from a synthetic aperture radar (SAR) on an autonomous vehicle (e.g., a drone). The image may be provided to a trained machine learning model that generates an output. Based on the output, inputs to control circuits of the autonomous vehicle are generated.

    LOADING DATA FROM MEMORY DURING DISPATCH

    公开(公告)号:US20220413742A1

    公开(公告)日:2022-12-29

    申请号:US17360455

    申请日:2021-06-28

    Abstract: A dispatch element interfaces with a host processor and dispatches threads to one or more tiles of a hybrid threading fabric. Data structures in memory to be used by a tile may be identified by a starting address and a size, included as parameters provided by the host. The dispatch element sends a command to a memory interface to transfer the identified data to the tile that will use the data. Thus, when the tile begins processing the thread, the data is already available in local memory of the tile and does not need to be accessed from the memory controller. Data may be transferred by the dispatch element while the tile is performing operations for another thread, increasing the percentage of operations performed by the tile that are performing useful work and reducing the percentage that are merely retrieving data.

    HARDWARE FOR CONCURRENT SINE AND COSINE DETERMINATION

    公开(公告)号:US20220317972A1

    公开(公告)日:2022-10-06

    申请号:US17405368

    申请日:2021-08-18

    Abstract: Devices and techniques for hardware for concurrent SINE and cosine determination are described herein. A first sequence of bits representing an angle of a line from an origin to a unit circle can be obtained. A quadrant of the unit circle for the line is determined and the two least significant bits of the first sequence of bits is replaced with an encoding for the quadrant, the angle is translated to a base quadrant angle and sin and cosine operations are performed on a portion of a second sequence of bits (derived from the first sequence of bits) to create intermediate sin and cosine solutions in the base quadrant. The quadrant encoding in the first sequence of bits is then used to create a final sin and cosine solutions in the quadrant from the intermediate solutions.

Patent Agency Ranking