Patent search ap:("Micron Technology Page Inc.") AND inv:"Douglas Vanesko"

11.

发明授权
Data input/output operations during loop execution in a reconfigurable compute fabric 有权

公开(公告)号：US11709796B2

公开(公告)日：2023-07-25

申请号：US17402840

申请日：2021-08-16

Applicant: Micron Technology, Inc.

Inventor： Bryan Hornung , Douglas Vanesko

IPC: G06F15/78 , G06F15/82 , G06F9/30

CPC classification number: G06F15/825 , G06F9/30065 , G06F15/7867

Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.

12.

发明申请
TILE-BASED RESULT BUFFERING IN MEMORY-COMPUTE SYSTEMS 有权

公开(公告)号：US20230067771A1

公开(公告)日：2023-03-02

申请号：US17407502

申请日：2021-08-20

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Tony M. Brewer , Gongyu Wang

IPC: G11C7/10 , G11C7/22 , G06F7/523

Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.

13.

发明申请
INTERPOLATION ACCELERATION IN A PROCESSOR MEMORY INTERFACE 有权

公开(公告)号：US20220318162A1

公开(公告)日：2022-10-06

申请号：US17240492

申请日：2021-04-26

Applicant: Micron Technology, Inc.

Inventor： Bryan Hornung , Tony M. Brewer , Douglas Vanesko , Patrick Estep

IPC: G06F13/16 , G06F9/38 , G06F9/30 , G06N20/00 , G01S13/90 , G01S7/41 , G01S13/933

Abstract: Linear interpolation is performed within a memory system. The memory system receives a floating-point point index into an integer-indexed memory array. The memory system accesses the two values of the two adjacent integer indices, performs the linear interpolation, and provides the resulting interpolated value. In many system architectures, the critical limitation on system performance is the data transfer rate between memory and processing elements. Accordingly, reducing the amount of data transferred improves overall system performance and reduces power consumption.

14.

发明申请
LOOP EXECUTION IN A RECONFIGURABLE COMPUTE FABRIC 有权

公开(公告)号：US20220206804A1

公开(公告)日：2022-06-30

申请号：US17405371

申请日：2021-08-18

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Bryan Hornung , Patrick Estep

IPC: G06F9/30

Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.

15.

发明授权
Efficient processing of nested loops for computing device with multiple configurable processing elements using multiple spoke counts 有权

公开(公告)号：US12293187B2

公开(公告)日：2025-05-06

申请号：US18524942

申请日：2023-11-30

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Tony M. Brewer

IPC: G06F9/30 , G06F9/32 , G06F9/38

Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.

16.

发明公开
LOOP EXECUTION IN A RECONFIGURABLE COMPUTE FABRIC USING FLOW CONTROLLERS FOR RESPECTIVE SYNCHRONOUS FLOWS 审中-公开

公开(公告)号：US20240192955A1

公开(公告)日：2024-06-13

申请号：US18426237

申请日：2024-01-29

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Bryan Hornung , Patrick Estep

IPC: G06F9/30 , G06F15/78 , G06F15/82

CPC classification number: G06F9/30065 , G06F9/30072 , G06F9/30087 , G06F9/3009 , G06F15/7867 , G06F15/825

Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.

17.

发明授权
Loop execution in a reconfigurable compute fabric using flow controllers for respective synchronous flows 有权

公开(公告)号：US11907718B2

公开(公告)日：2024-02-20

申请号：US17405371

申请日：2021-08-18

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Bryan Hornung , Patrick Estep

IPC: G06F9/30 , G06F15/78 , G06F15/82

CPC classification number: G06F9/30065 , G06F9/3009 , G06F9/30072 , G06F9/30087 , G06F15/7867 , G06F15/825

Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.

18.

发明申请
EFFICIENT COMPLEX MULTIPLY AND ACCUMULATE 有权

公开(公告)号：US20220413804A1

公开(公告)日：2022-12-29

申请号：US17360407

申请日：2021-06-28

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Bryan Hornung

IPC: G06F7/544 , G06F7/53 , G06F9/38 , G06K9/62 , G06N20/00

Abstract: Two commands each perform a partial complex multiply and accumulate. By using these two commands together, a full complex multiply and accumulate operation is performed. As compared to traditional implementations, this reduces the number of commands used from eight (four multiplies, a subtraction and three adds) to two. In some example embodiments, a single-instruction/multiple-data (SIMD) architecture is used to enable each command to perform multiple partial complex multiply and accumulate operations simultaneously, further increasing efficiency. One application of a complex multiply and accumulate is in generating images from pulse data of a radar or lidar. For example, an image may be generated from a synthetic aperture radar (SAR) on an autonomous vehicle (e.g., a drone). The image may be provided to a trained machine learning model that generates an output. Based on the output, inputs to control circuits of the autonomous vehicle are generated.

19.

发明申请
LOADING DATA FROM MEMORY DURING DISPATCH 有权

公开(公告)号：US20220413742A1

公开(公告)日：2022-12-29

申请号：US17360455

申请日：2021-06-28

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Bryan Hornung , Tony M. Brewer

IPC: G06F3/06

Abstract: A dispatch element interfaces with a host processor and dispatches threads to one or more tiles of a hybrid threading fabric. Data structures in memory to be used by a tile may be identified by a starting address and a size, included as parameters provided by the host. The dispatch element sends a command to a memory interface to transfer the identified data to the tile that will use the data. Thus, when the tile begins processing the thread, the data is already available in local memory of the tile and does not need to be accessed from the memory controller. Data may be transferred by the dispatch element while the tile is performing operations for another thread, increasing the percentage of operations performed by the tile that are performing useful work and reducing the percentage that are merely retrieving data.

20.

发明申请
HARDWARE FOR CONCURRENT SINE AND COSINE DETERMINATION 有权

公开(公告)号：US20220317972A1

公开(公告)日：2022-10-06

申请号：US17405368

申请日：2021-08-18

Applicant: Micron Technology, Inc.

Inventor： Douglas Vanesko , Tony M. Brewer , Bryan Hornung , Patrick Estep

IPC: G06F7/548

Abstract: Devices and techniques for hardware for concurrent SINE and cosine determination are described herein. A first sequence of bits representing an angle of a line from an origin to a unit circle can be obtained. A quadrant of the unit circle for the line is determined and the two least significant bits of the first sequence of bits is replaced with an encoding for the quadrant, the angle is translated to a base quadrant angle and sin and cosine operations are performed on a portion of a second sequence of bits (derived from the first sequence of bits) to create intermediate sin and cosine solutions in the base quadrant. The quadrant encoding in the first sequence of bits is then used to create a final sin and cosine solutions in the quadrant from the intermediate solutions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification