-
公开(公告)号:US11782725B2
公开(公告)日:2023-10-10
申请号:US17402997
申请日:2021-08-16
Applicant: Micron Technology, Inc.
Inventor: Bryan Hornung , Skyler Arron Windh
CPC classification number: G06F9/3887 , G06F9/30018 , G06F9/30065 , G06F9/30087 , G06F15/7867 , G06F15/8007 , G06F15/825
Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.
-
公开(公告)号:US11748298B2
公开(公告)日:2023-09-05
申请号:US17826674
申请日:2022-05-27
Applicant: Intel Corporation
Inventor: Altug Koker , Farshad Akhbari , Feng Chen , Dukhwan Kim , Narayan Srinivasa , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu
IPC: G06F15/80 , G06F13/40 , G06T1/20 , G06F9/30 , G06F13/00 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/048
CPC classification number: G06F15/8007 , G06F9/3004 , G06F13/00 , G06F13/4027 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/063 , G06N3/084 , G06T1/20
Abstract: An integrated circuit (IC) package apparatus is disclosed. The IC package includes one or more processing units and a bridge, mounted below the one or more processing unit, including one or more arithmetic logic units (ALUs) to perform atomic operations.
-
公开(公告)号:US20230245374A1
公开(公告)日:2023-08-03
申请号:US18133088
申请日:2023-04-11
Applicant: Imagination Technologies Limited
Inventor: Luke T. Peterson , James A. McCombe , Steven J. Clohset , Jason R. Redgrave
CPC classification number: G06T15/005 , G06F9/5033 , G06F15/8007 , G06F9/52 , G06F9/505 , G06T1/20 , G06T1/60 , G06T15/06 , G06T2200/28
Abstract: In some aspects, systems and methods provide for forming groupings of a plurality of independently-specified computation workloads, such as graphics processing workloads, and in a specific example, ray tracing workloads. The workloads include a scheduling key, which is one basis on which the groupings can be formed. Workloads grouped together can all execute from the same source of instructions, on one or more different private data elements. Such workloads can recursively instantiate other workloads that reference the same private data elements. In some examples, the scheduling key can be used to identify a data element to be used by all the workloads of a grouping. Memory conflicts to private data elements are handled through scheduling of non-conflicted workloads or specific instructions and/or deferring conflicted workloads instead of locking memory locations.
-
公开(公告)号:US20230214351A1
公开(公告)日:2023-07-06
申请号:US17566848
申请日:2021-12-31
Applicant: CEREMORPHIC, INC.
Inventor: Heonchul PARK
CPC classification number: G06F15/8007 , G06F9/382
Abstract: An exemplary SIMD computing system comprises a SIMD processing element (SPE) configured to perform a selected operation on a portion of a processor input data word, with the operation selected by control signals read from a control memory location addressed by a decoded instruction. The SPE may comprise one or more adder, multiplier, or multiplexer coupled to the control signals. The control signals may comprise one or more bit read from the control memory. The control memory may be an MxN (M rows by N columns) memory having M possible SIMD operations and N control signals. Each instruction decoded may select an SPE operation from among N rows. A plurality of SPEs may receive the same control signals. The control memory may be rewritable, advantageously permitting customizable SIMD operations that are reconfigurable by storing in the control memory locations control signals designed to cause the SPE to perform selected operations.
-
公开(公告)号:US20190250921A1
公开(公告)日:2019-08-15
申请号:US16398183
申请日:2019-04-29
Applicant: Intel Corporation
Inventor: Andrew T. FORSYTH , Brian J. HICKMANN , Jonathan C. HALL , Christopher J. HUGHES
IPC: G06F9/38 , G06F9/30 , G06F12/1027 , G06F12/0875 , G06F13/42 , G06F15/80
CPC classification number: G06F9/3853 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30105 , G06F9/30145 , G06F9/30163 , G06F9/3804 , G06F9/3824 , G06F9/3836 , G06F9/3887 , G06F12/0875 , G06F12/1027 , G06F13/4282 , G06F15/8007 , G06F2212/1016 , G06F2212/452 , G06F2212/68
Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
-
公开(公告)号:US20190205783A1
公开(公告)日:2019-07-04
申请号:US16234112
申请日:2018-12-27
Applicant: University of Maryland, College Park , IonQ, Inc.
Inventor: Yunseong NAM , Dmitri MASLOV
CPC classification number: G06N10/00 , G06F15/8007 , G06F17/14
Abstract: The disclosure describes various aspects of techniques for using global interactions in efficient quantum circuit constructions. More specifically, this disclosure describes ways to use a global entangling operator to efficiently implement circuitry common to a selection of important quantum algorithms. The circuits may be constructed with global Ising entangling gates (e.g., global Mølmer-Sørenson gates or GMS gates) and arbitrary addressable single-qubit gates. Examples of the types of circuits that can be implemented include stabilizer circuits, Toffoli-4 gates, Toffoli-n gates, quantum Fourier transformation (QTF) circuits, and quantum Fourier adder (QFA) circuits. In certain instances, the use of global operations can substantially improve the entangling gate count.
-
公开(公告)号:US20180232239A1
公开(公告)日:2018-08-16
申请号:US15890548
申请日:2018-02-07
Applicant: International Business Machines Corporation
Inventor: Gheorghe Almasi , Jose Moreira , Jessica H. Tseng , Peng Wu
CPC classification number: G06F9/3887 , G06F8/41 , G06F8/443 , G06F8/452 , G06F8/453 , G06F8/458 , G06F9/30058 , G06F9/30065 , G06F9/325 , G06F9/3851 , G06F9/3885 , G06F15/80 , G06F15/8007 , G06F15/8023
Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
-
公开(公告)号:US20180225255A1
公开(公告)日:2018-08-09
申请号:US15870632
申请日:2018-01-12
Applicant: INTEL CORPORATION
Inventor: Chang Yong Kang , Pierre Laurent , Hari K. Tadepalli , Prasad M. Ghatigar , T.J. O'Dwyer , Serge Zhilyaev
CPC classification number: G06F15/8061 , G06F9/30036 , G06F9/3814 , G06F9/3834 , G06F9/3836 , G06F9/3838 , G06F9/3853 , G06F9/3867 , G06F9/3877 , G06F13/16 , G06F13/4059 , G06F15/8007 , G06F15/8084
Abstract: Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.
-
公开(公告)号:US10013258B2
公开(公告)日:2018-07-03
申请号:US14500171
申请日:2014-09-29
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Michael K. Gschwind
CPC classification number: G06F9/30007 , G06F9/3001 , G06F9/30032 , G06F9/30134 , G06F9/345 , G06F9/355 , G06F9/3552 , G06F9/3555 , G06F9/3557 , G06F12/0223 , G06F15/8007
Abstract: Embodiments are directed to a method of adjusting an index, wherein the index identifies a location of an element within an array. The method includes executing, by a computer, a single instruction that adjusts a first parameter of the index to match a parameter of an array address. The single instruction further adjusts a second parameter of the index to match a parameter of the array element. The adjustment of the first parameter includes a sign extension.
-
公开(公告)号:US09996345B2
公开(公告)日:2018-06-12
申请号:US15385544
申请日:2016-12-20
Applicant: Imagination Technologies Limited
Inventor: Kristie Veith , Leonard Rarick , Manouk Manoukian
CPC classification number: G06F9/3001 , G06F9/30145 , G06F9/3836 , G06F9/3867 , G06F9/3873 , G06F9/3887 , G06F15/8007
Abstract: In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.
-
-
-
-
-
-
-
-
-