-
公开(公告)号:US11709796B2
公开(公告)日:2023-07-25
申请号:US17402840
申请日:2021-08-16
Applicant: Micron Technology, Inc.
Inventor: Bryan Hornung , Douglas Vanesko
CPC classification number: G06F15/825 , G06F9/30065 , G06F15/7867
Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.
-
公开(公告)号:US20230067771A1
公开(公告)日:2023-03-02
申请号:US17407502
申请日:2021-08-20
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Tony M. Brewer , Gongyu Wang
Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.
-
公开(公告)号:US20220318162A1
公开(公告)日:2022-10-06
申请号:US17240492
申请日:2021-04-26
Applicant: Micron Technology, Inc.
Inventor: Bryan Hornung , Tony M. Brewer , Douglas Vanesko , Patrick Estep
Abstract: Linear interpolation is performed within a memory system. The memory system receives a floating-point point index into an integer-indexed memory array. The memory system accesses the two values of the two adjacent integer indices, performs the linear interpolation, and provides the resulting interpolated value. In many system architectures, the critical limitation on system performance is the data transfer rate between memory and processing elements. Accordingly, reducing the amount of data transferred improves overall system performance and reduces power consumption.
-
公开(公告)号:US20220206804A1
公开(公告)日:2022-06-30
申请号:US17405371
申请日:2021-08-18
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Bryan Hornung , Patrick Estep
IPC: G06F9/30
Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
-
公开(公告)号:US12293187B2
公开(公告)日:2025-05-06
申请号:US18524942
申请日:2023-11-30
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Tony M. Brewer
Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.
-
16.
公开(公告)号:US20240192955A1
公开(公告)日:2024-06-13
申请号:US18426237
申请日:2024-01-29
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Bryan Hornung , Patrick Estep
CPC classification number: G06F9/30065 , G06F9/30072 , G06F9/30087 , G06F9/3009 , G06F15/7867 , G06F15/825
Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
-
17.
公开(公告)号:US11907718B2
公开(公告)日:2024-02-20
申请号:US17405371
申请日:2021-08-18
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Bryan Hornung , Patrick Estep
CPC classification number: G06F9/30065 , G06F9/3009 , G06F9/30072 , G06F9/30087 , G06F15/7867 , G06F15/825
Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
-
公开(公告)号:US20220413804A1
公开(公告)日:2022-12-29
申请号:US17360407
申请日:2021-06-28
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Bryan Hornung
Abstract: Two commands each perform a partial complex multiply and accumulate. By using these two commands together, a full complex multiply and accumulate operation is performed. As compared to traditional implementations, this reduces the number of commands used from eight (four multiplies, a subtraction and three adds) to two. In some example embodiments, a single-instruction/multiple-data (SIMD) architecture is used to enable each command to perform multiple partial complex multiply and accumulate operations simultaneously, further increasing efficiency. One application of a complex multiply and accumulate is in generating images from pulse data of a radar or lidar. For example, an image may be generated from a synthetic aperture radar (SAR) on an autonomous vehicle (e.g., a drone). The image may be provided to a trained machine learning model that generates an output. Based on the output, inputs to control circuits of the autonomous vehicle are generated.
-
公开(公告)号:US20220413742A1
公开(公告)日:2022-12-29
申请号:US17360455
申请日:2021-06-28
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Bryan Hornung , Tony M. Brewer
IPC: G06F3/06
Abstract: A dispatch element interfaces with a host processor and dispatches threads to one or more tiles of a hybrid threading fabric. Data structures in memory to be used by a tile may be identified by a starting address and a size, included as parameters provided by the host. The dispatch element sends a command to a memory interface to transfer the identified data to the tile that will use the data. Thus, when the tile begins processing the thread, the data is already available in local memory of the tile and does not need to be accessed from the memory controller. Data may be transferred by the dispatch element while the tile is performing operations for another thread, increasing the percentage of operations performed by the tile that are performing useful work and reducing the percentage that are merely retrieving data.
-
公开(公告)号:US20220317972A1
公开(公告)日:2022-10-06
申请号:US17405368
申请日:2021-08-18
Applicant: Micron Technology, Inc.
Inventor: Douglas Vanesko , Tony M. Brewer , Bryan Hornung , Patrick Estep
IPC: G06F7/548
Abstract: Devices and techniques for hardware for concurrent SINE and cosine determination are described herein. A first sequence of bits representing an angle of a line from an origin to a unit circle can be obtained. A quadrant of the unit circle for the line is determined and the two least significant bits of the first sequence of bits is replaced with an encoding for the quadrant, the angle is translated to a base quadrant angle and sin and cosine operations are performed on a portion of a second sequence of bits (derived from the first sequence of bits) to create intermediate sin and cosine solutions in the base quadrant. The quadrant encoding in the first sequence of bits is then used to create a final sin and cosine solutions in the quadrant from the intermediate solutions.
-
-
-
-
-
-
-
-
-