-
公开(公告)号:US20230060900A1
公开(公告)日:2023-03-02
申请号:US17959872
申请日:2022-10-04
Applicant: INTEL CORPORATION
Inventor: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
公开(公告)号:US20220229661A1
公开(公告)日:2022-07-21
申请号:US17712966
申请日:2022-04-04
Applicant: INTEL CORPORATION
Inventor: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
公开(公告)号:US20220342747A1
公开(公告)日:2022-10-27
申请号:US17845628
申请日:2022-06-21
Applicant: Intel Corporation
Inventor: ElMoustapha OULD-AHMED-VALL
Abstract: Apparatus and Method for Fault Handling of an Offload Transaction. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, the work comprising a plurality of instructions; the second core comprising first fault management logic to determine an action to take responsive to a fault condition, wherein responsive to detecting a first type of fault condition, the first fault management logic is to cause the first core to be notified of the fault condition, the first core comprising second fault management logic to attempt to resolve the fault condition.
-
公开(公告)号:US20220129273A1
公开(公告)日:2022-04-28
申请号:US17518235
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY , Jesus CORBAL , Venkateswara MADDURI
Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one exemplary processor comprises three registers and execution circuitry. The execution circuitry is to multiply first and second packed signed doubleword data elements from the first register with third and fourth packed signed doubleword data elements from the second register, respectively, to generate first and second temporary products. It is also to select first, second, third, and fourth signed doubleword data elements. It is also to combine the first temporary products with a first packed signed quadword value read from the third register to generate a first accumulated result and to combine the second temporary product with a second packed signed quadword value read from the third source register to generate a second accumulated result. The third register is to store the results.
-
公开(公告)号:US20220129268A1
公开(公告)日:2022-04-28
申请号:US17518336
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 16 most significant bits of the first and second right-shifted quadwords to be written to 16 least significant bit regions of first and second quadword data element locations of a destination register.
-
6.
公开(公告)号:US20220129267A1
公开(公告)日:2022-04-28
申请号:US17518291
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one processor embodiment comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 32 most significant bits of the first and second right-shifted quadwords to be written to 32 least significant bit positions of first and second quadword data element locations of a destination register.
-
7.
公开(公告)号:US20200310809A1
公开(公告)日:2020-10-01
申请号:US16366155
申请日:2019-03-27
Applicant: Intel Corporation
Inventor: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
公开(公告)号:US20200310804A1
公开(公告)日:2020-10-01
申请号:US16370922
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Christopher J. HUGHES , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.
-
-
-
-
-
-
-