专利检索 cpc:"G06F9/30043" 第 1 页

1.

发明公开
Schedule Instructions of a Program of Data Flows for Execution in Tiles of a Coarse Grained Reconfigurable Array 审中-公开

公开(公告)号：US20240362024A1

公开(公告)日：2024-10-31

申请号：US18770560

申请日：2024-07-11

申请人： Micron Technology, Inc.

发明人： Allan Kennedy Porterfield , Skyler Arron Windh , Bashar Romanous

IPC分类号： G06F9/30

CPC分类号： G06F9/30181 , G06F9/30043

摘要： Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.

2.

发明授权
Mixed scalar and vector operations in multi-threaded computing 有权

公开(公告)号：US12131157B2

公开(公告)日：2024-10-29

申请号：US17984336

申请日：2022-11-10

申请人： AzurEngine Technologies Zhuhai Inc.

发明人： Toshio Nagata , Yuan Li , Jianbin Zhu , Ryan Braidwood

IPC分类号： G06F9/30 , G06F9/32

CPC分类号： G06F9/30145 , G06F9/30036 , G06F9/30043 , G06F9/321

摘要： Processors, systems and methods are provided for thread level parallel processing. A processor may include a sequencer configured to: decode instructions that include scalar instructions and vector instructions, execute decoded scalar instructions, and package decoded vector instructions as configurations. The processor may further include a plurality of columns of vector processing units coupled to the sequencer. The plurality of columns of vector processing units may include a plurality of processing elements (PEs) and each of the PEs may include a plurality of Arithmetic Logic Units (ALUs). The sequencer may be configured to send the configurations to the plurality of columns of vector processing units.

3.

发明授权
Fine-grained multithreaded cores executing fused operations in multiple clock cycles 有权

公开(公告)号：US12130744B2

公开(公告)日：2024-10-29

申请号：US17672116

申请日：2022-02-15

申请人： Mobileye Vision Technologies Ltd.

发明人： Yosef Kreinin , Yosi Arbeli , Gil Israel Dogon

IPC分类号： G06F9/30 , G06F7/00 , G06F9/345 , G06F9/38 , G06F9/52 , G06F11/10 , G06F12/084 , G06F12/0842 , G06F12/0875 , G06F15/78 , G06F15/80 , G06T1/20 , G06F12/0811

CPC分类号： G06F12/0875 , G06F7/00 , G06F9/3001 , G06F9/30036 , G06F9/30043 , G06F9/3012 , G06F9/30123 , G06F9/3017 , G06F9/30181 , G06F9/345 , G06F9/3824 , G06F9/3826 , G06F9/3834 , G06F9/3851 , G06F9/3865 , G06F9/3891 , G06F9/526 , G06F11/1008 , G06F12/084 , G06F12/0842 , G06F15/7867 , G06F15/80 , G06T1/20 , G06F12/0811 , G06F2212/452 , G06F2212/62

摘要： A multi-core processor configured to improve processing performance in certain computing contexts is provided. The multi-core processor includes multiple processing cores that implement barrel threading to execute multiple instruction threads in parallel while ensuring that the effects of an idle instruction or thread upon the performance of the processor is minimized. The multiple cores can also share a common data cache, thereby minimizing the need for expensive and complex mechanisms to mitigate inter-cache coherency issues. The barrel-threading can minimize the latency impacts associated with a shared data cache. In some examples, the multi-core processor can also include a serial processor configured to execute single threaded programming code that may not yield satisfactory performance in a processing environment that employs barrel threading.

4.

发明公开
Multi-tile Memory Management for Detecting Cross Tile Access Providing Multi-Tile Inference Scaling and Providing Page Migration 审中-公开

公开(公告)号：US20240345990A1

公开(公告)日：2024-10-17

申请号：US18626775

申请日：2024-04-04

申请人： Intel Corporation

发明人： Lakshminarayanan Striramassarma , Prasoonkumar Surti , Varghese George , Ben Ashbaugh , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Nicolas Galoppo Von Borries , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Nilay Mistry , Elmoustapha Ould-Ahmed-Vall , Selvakumar Panneer , Vasanth Ranganathan , Joydeep Ray , Ankur Shah , Saurabh Tangri

IPC分类号： G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46

CPC分类号： G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

摘要： Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.

5.

发明公开
SYSTEMS AND METHODS FOR STALLING HOST PROCESSOR 审中-公开

公开(公告)号：US20240345869A1

公开(公告)日：2024-10-17

申请号：US18541670

申请日：2023-12-15

申请人： Dover Microsystems, Inc.

发明人： Steven Milburn , Gregory T. Sullivan

IPC分类号： G06F9/48 , G06F9/30 , G06F13/24 , G06F21/75

CPC分类号： G06F9/4812 , G06F9/30043 , G06F9/3013 , G06F13/24 , G06F21/75

摘要： Systems and methods for stalling a host processor. In some embodiments, the host processor may be caused to initiate one or more selected transactions, wherein the one or more selected transactions comprise a bus transaction. The host processor may be prevented from completing the one or more selected transactions, to thereby stall the host processor.

6.

发明授权
Cache coherence validation using delayed fulfillment of L2 requests 有权

公开(公告)号：US12118355B2

公开(公告)日：2024-10-15

申请号：US17506122

申请日：2021-10-20

申请人： International Business Machines Corporation

发明人： Shakti Kapoor , Manoj Dusanapudi , Nelson Wu

IPC分类号： G06F9/30 , G06F9/38 , G06F12/0811

CPC分类号： G06F9/30043 , G06F9/30047 , G06F9/3834 , G06F9/3836 , G06F9/3861 , G06F12/0811

摘要： Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.

7.

发明授权
Distributed graphics processor unit architecture 有权

公开(公告)号：US12111789B2

公开(公告)日：2024-10-08

申请号：US16855879

申请日：2020-04-22

申请人： Micron Technology, Inc.

发明人： Dmitri Yudanov

IPC分类号： G06F9/38 , G06F9/30 , G06F9/50 , G06F15/80 , G06N3/063 , G06T1/20 , G06T1/60

CPC分类号： G06F15/8092 , G06F9/30043 , G06F9/3877 , G06F9/5083 , G06N3/063 , G06T1/20 , G06T1/60

摘要： The present disclosure is directed to a distributed graphics processor unit (GPU) architecture that includes an array of processing nodes. Each processing node may include a GPU node that is coupled to its own fast memory unit and its own storage unit. The fast memory unit and storage unit may be integrated into a single unit or may be separately coupled to the GPU node. The processing node may have its fast memory unit coupled to both the GPU node and the storage node. The various architectures provide a GPU-based system that may be treated as a storage unit, such as solid state drive (SSD) that performs onboard processing to perform memory-oriented operations. In this respect, the system may be viewed as a “smart drive” for big-data near-storage processing.

8.

发明公开
BASE PLUS OFFSET ADDRESSING FOR LOAD/STORE MESSAGES 审中-公开

公开(公告)号：US20240330001A1

公开(公告)日：2024-10-03

申请号：US18620217

申请日：2024-03-28

申请人： Intel Corporation

发明人： John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio

IPC分类号： G06F9/38 , G06F9/30 , G06F9/355 , G06F15/78

CPC分类号： G06F9/3887 , G06F9/355 , G06F15/7839 , G06F9/30036 , G06F9/30043

摘要： Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.

9.

发明授权
System call management in a user-mode, multi-threaded, self-scheduling processor 有权

公开(公告)号：US12106142B2

公开(公告)日：2024-10-01

申请号：US17337788

申请日：2021-06-03

申请人： Micron Technology, Inc.

发明人： Tony M. Brewer

IPC分类号： G06F9/48 , G06F9/30 , G06F9/38 , G06F9/54 , G06F17/14

CPC分类号： G06F9/4881 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30192 , G06F9/3806 , G06F9/542 , G06F17/142 , G06F2209/5011

摘要： Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.

10.

发明公开
SYSTEMS, METHODS, AND APPPARATUS FOR MATRIX MOVE 审中-公开

公开(公告)号：US20240320001A1

公开(公告)日：2024-09-26

申请号：US18663228

申请日：2024-05-14

申请人： Intel Corporation

发明人： Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Jesus CORBAL , Dan BAUM , Alexander HEINECKE , Elmoustapha OULD-AHMED-VALL

IPC分类号： G06F9/30 , G06F7/485 , G06F7/487 , G06F7/76 , G06F9/38 , G06F17/16

CPC分类号： G06F9/30036 , G06F7/485 , G06F7/4876 , G06F7/762 , G06F9/3001 , G06F9/30032 , G06F9/30043 , G06F9/30109 , G06F9/30112 , G06F9/30134 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/30185 , G06F9/30196 , G06F9/3818 , G06F9/3836 , G06F17/16 , G06F2212/454

摘要： Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类