-
1.
公开(公告)号:US20240362024A1
公开(公告)日:2024-10-31
申请号:US18770560
申请日:2024-07-11
IPC分类号: G06F9/30
CPC分类号: G06F9/30181 , G06F9/30043
摘要: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.
-
公开(公告)号:US12131157B2
公开(公告)日:2024-10-29
申请号:US17984336
申请日:2022-11-10
发明人: Toshio Nagata , Yuan Li , Jianbin Zhu , Ryan Braidwood
CPC分类号: G06F9/30145 , G06F9/30036 , G06F9/30043 , G06F9/321
摘要: Processors, systems and methods are provided for thread level parallel processing. A processor may include a sequencer configured to: decode instructions that include scalar instructions and vector instructions, execute decoded scalar instructions, and package decoded vector instructions as configurations. The processor may further include a plurality of columns of vector processing units coupled to the sequencer. The plurality of columns of vector processing units may include a plurality of processing elements (PEs) and each of the PEs may include a plurality of Arithmetic Logic Units (ALUs). The sequencer may be configured to send the configurations to the plurality of columns of vector processing units.
-
公开(公告)号:US12130744B2
公开(公告)日:2024-10-29
申请号:US17672116
申请日:2022-02-15
发明人: Yosef Kreinin , Yosi Arbeli , Gil Israel Dogon
IPC分类号: G06F9/30 , G06F7/00 , G06F9/345 , G06F9/38 , G06F9/52 , G06F11/10 , G06F12/084 , G06F12/0842 , G06F12/0875 , G06F15/78 , G06F15/80 , G06T1/20 , G06F12/0811
CPC分类号: G06F12/0875 , G06F7/00 , G06F9/3001 , G06F9/30036 , G06F9/30043 , G06F9/3012 , G06F9/30123 , G06F9/3017 , G06F9/30181 , G06F9/345 , G06F9/3824 , G06F9/3826 , G06F9/3834 , G06F9/3851 , G06F9/3865 , G06F9/3891 , G06F9/526 , G06F11/1008 , G06F12/084 , G06F12/0842 , G06F15/7867 , G06F15/80 , G06T1/20 , G06F12/0811 , G06F2212/452 , G06F2212/62
摘要: A multi-core processor configured to improve processing performance in certain computing contexts is provided. The multi-core processor includes multiple processing cores that implement barrel threading to execute multiple instruction threads in parallel while ensuring that the effects of an idle instruction or thread upon the performance of the processor is minimized. The multiple cores can also share a common data cache, thereby minimizing the need for expensive and complex mechanisms to mitigate inter-cache coherency issues. The barrel-threading can minimize the latency impacts associated with a shared data cache. In some examples, the multi-core processor can also include a serial processor configured to execute single threaded programming code that may not yield satisfactory performance in a processing environment that employs barrel threading.
-
公开(公告)号:US20240345990A1
公开(公告)日:2024-10-17
申请号:US18626775
申请日:2024-04-04
申请人: Intel Corporation
发明人: Lakshminarayanan Striramassarma , Prasoonkumar Surti , Varghese George , Ben Ashbaugh , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Nicolas Galoppo Von Borries , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Nilay Mistry , Elmoustapha Ould-Ahmed-Vall , Selvakumar Panneer , Vasanth Ranganathan , Joydeep Ray , Ankur Shah , Saurabh Tangri
IPC分类号: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:US20240345869A1
公开(公告)日:2024-10-17
申请号:US18541670
申请日:2023-12-15
CPC分类号: G06F9/4812 , G06F9/30043 , G06F9/3013 , G06F13/24 , G06F21/75
摘要: Systems and methods for stalling a host processor. In some embodiments, the host processor may be caused to initiate one or more selected transactions, wherein the one or more selected transactions comprise a bus transaction. The host processor may be prevented from completing the one or more selected transactions, to thereby stall the host processor.
-
公开(公告)号:US12118355B2
公开(公告)日:2024-10-15
申请号:US17506122
申请日:2021-10-20
发明人: Shakti Kapoor , Manoj Dusanapudi , Nelson Wu
IPC分类号: G06F9/30 , G06F9/38 , G06F12/0811
CPC分类号: G06F9/30043 , G06F9/30047 , G06F9/3834 , G06F9/3836 , G06F9/3861 , G06F12/0811
摘要: Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.
-
公开(公告)号:US12111789B2
公开(公告)日:2024-10-08
申请号:US16855879
申请日:2020-04-22
发明人: Dmitri Yudanov
CPC分类号: G06F15/8092 , G06F9/30043 , G06F9/3877 , G06F9/5083 , G06N3/063 , G06T1/20 , G06T1/60
摘要: The present disclosure is directed to a distributed graphics processor unit (GPU) architecture that includes an array of processing nodes. Each processing node may include a GPU node that is coupled to its own fast memory unit and its own storage unit. The fast memory unit and storage unit may be integrated into a single unit or may be separately coupled to the GPU node. The processing node may have its fast memory unit coupled to both the GPU node and the storage node. The various architectures provide a GPU-based system that may be treated as a storage unit, such as solid state drive (SSD) that performs onboard processing to perform memory-oriented operations. In this respect, the system may be viewed as a “smart drive” for big-data near-storage processing.
-
公开(公告)号:US20240330001A1
公开(公告)日:2024-10-03
申请号:US18620217
申请日:2024-03-28
申请人: Intel Corporation
发明人: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC分类号: G06F9/3887 , G06F9/355 , G06F15/7839 , G06F9/30036 , G06F9/30043
摘要: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.
-
公开(公告)号:US12106142B2
公开(公告)日:2024-10-01
申请号:US17337788
申请日:2021-06-03
发明人: Tony M. Brewer
CPC分类号: G06F9/4881 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30192 , G06F9/3806 , G06F9/542 , G06F17/142 , G06F2209/5011
摘要: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
-
公开(公告)号:US20240320001A1
公开(公告)日:2024-09-26
申请号:US18663228
申请日:2024-05-14
申请人: Intel Corporation
发明人: Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Jesus CORBAL , Dan BAUM , Alexander HEINECKE , Elmoustapha OULD-AHMED-VALL
CPC分类号: G06F9/30036 , G06F7/485 , G06F7/4876 , G06F7/762 , G06F9/3001 , G06F9/30032 , G06F9/30043 , G06F9/30109 , G06F9/30112 , G06F9/30134 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/30185 , G06F9/30196 , G06F9/3818 , G06F9/3836 , G06F17/16 , G06F2212/454
摘要: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.
-
-
-
-
-
-
-
-
-