-
公开(公告)号:US20230060900A1
公开(公告)日:2023-03-02
申请号:US17959872
申请日:2022-10-04
申请人: INTEL CORPORATION
发明人: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC分类号: G06F9/30
摘要: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
公开(公告)号:US20200210188A1
公开(公告)日:2020-07-02
申请号:US16233546
申请日:2018-12-27
申请人: Intel Corporation
发明人: Elmoustapha OULD-AHMED-VALL , Jonathan D. PEARCE , Dan BAUM , Guei-Yuan LUEH , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
摘要: Disclosed embodiments relate to systems and methods for performing matrix row-wise and column-wise permute instructions. In one example, a processor includes fetch circuitry to fetch an instruction, decoding, using decode circuitry, the fetched instruction having fields to specify an opcode and locations of a source matrix and a destination matrix, the opcode indicating the processor is to perform a permutation by copying, into each of a plurality of equal-sized logical partitions of the destination matrix, a selected logical partition of a same size from the source matrix, the selection being indicated by a permute control, and execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20200210173A1
公开(公告)日:2020-07-02
申请号:US16232599
申请日:2018-12-26
申请人: Intel Corporation
发明人: Elmoustapha OULD-AHMED-VALL , Jonathan D. PEARCE , Dan BAUM , Guei-Yuan LUEH , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
IPC分类号: G06F9/30
摘要: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20220229661A1
公开(公告)日:2022-07-21
申请号:US17712966
申请日:2022-04-04
申请人: INTEL CORPORATION
发明人: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC分类号: G06F9/30
摘要: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
公开(公告)号:US20200089494A1
公开(公告)日:2020-03-19
申请号:US16579394
申请日:2019-09-23
申请人: Intel Corporation
发明人: Asit K. MISHRA , Edward T. GROCHOWSKI , Jonathan D. PEARCE , Deborah T. MARR , Ehud COHEN , Elmoustapha OULD-AHMED-VALL , Jesus Corbal SAN ADRIAN , Robert VALENTINE , Mark J. CHARNEY , Christopher J. HUGHES , Milind B. GIRKAR
IPC分类号: G06F9/30
摘要: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
-
公开(公告)号:US20230418655A1
公开(公告)日:2023-12-28
申请号:US18207870
申请日:2023-06-09
申请人: Intel Corporation
发明人: Rajesh M. SANKARAN , Gilbert NEIGER , Narayan RANGANATHAN , Stephen R. VAN DOREN , Joseph NUZMAN , Niall D. MCDONNELL , Michael A. O'HANLON , Lokpraveen B. MOSUR , Tracy Garrett DRYSDALE , Eriko NURVITADHI , Asit K. MISHRA , Ganesh VENKATESH , Deborah T. MARR , Nicholas P. CARTER , Jonathan D. PEARCE , Edward T. GROCHOWSKI , Richard J. GRECO , Robert VALENTINE , Jesus CORBAL , Thomas D. FLETCHER , Dennis R. BRADFORD , Dwight P. MANLEY , Mark J. CHARNEY , Jeffrey J. COOK , Paul CAPRIOLI , Koichi YAMADA , Kent D. GLOSSOP , David B. SHEFFIELD
CPC分类号: G06F9/48 , G06F9/3001 , G06F9/383 , G06F9/3004 , G06F9/30036
摘要: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
公开(公告)号:US20220164218A1
公开(公告)日:2022-05-26
申请号:US17381521
申请日:2021-07-21
申请人: Intel Corporation
发明人: Rajesh M. SANKARAN , Gilbert NEIGER , Narayan RANGANATHAN , Stephen R. VAN DOREN , Joseph NUZMAN , Niall D. MCDONNELL , Michael A. O'HANLON , Lokpraveen B. MOSUR , Tracy Garrett DRYSDALE , Eriko NURVITADHI , Asit K. MISHRA , Ganesh VENKATESH , Deborah T. MARR , Nicholas P. CARTER , Jonathan D. PEARCE , Edward T. GROCHOWSKI , Richard J. GRECO , Robert VALENTINE , Jesus CORBAL , Thomas D. FLETCHER , Dennis R. BRADFORD , Dwight P. MANLEY , Mark J. CHARNEY , Jeffrey J. COOK , Paul CAPRIOLI , Koichi YAMADA , Kent D. GLOSSOP , David B. SHEFFIELD
摘要: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
8.
公开(公告)号:US20200310809A1
公开(公告)日:2020-10-01
申请号:US16366155
申请日:2019-03-27
申请人: Intel Corporation
发明人: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC分类号: G06F9/30
摘要: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
-
-
-
-
-
-