-
公开(公告)号:US20180189065A1
公开(公告)日:2018-07-05
申请号:US15396345
申请日:2016-12-30
Applicant: Intel Corporation
Inventor: KARTHIK RAMAN , ARIEL SLONIM , ADY TAL
CPC classification number: G06F9/30014 , G06F9/30065 , G06F9/30072 , G06F11/30
Abstract: An apparatus and method are described for floating point operation (FLOP) accounting. For example, one embodiment of a processor comprises: an instruction fetch unit to fetch instructions from system memory, the instructions including at least one masked vector floating point instruction to perform operations on a plurality of floating point data elements; a mask register to store a mask value associated with the masked vector floating point instruction; a decoder to decode the masked vector floating point instruction; and floating point operations (FLOP) accounting circuitry to read the mask register to determine a number of floating point operations to be performed during execution of the masked vector floating point instruction.
-
公开(公告)号:US20220100509A1
公开(公告)日:2022-03-31
申请号:US17493667
申请日:2021-10-04
Applicant: Intel Corporation
Inventor: WILLIAM M. BROWN , ROLAND SCHULZ , KARTHIK RAMAN
Abstract: An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. For example, one embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; execution circuitry to execute the operations of the decoded broadcast instruction, the execution circuitry to copy a first number of contiguous data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, the execution circuitry to further copy a second number of contiguous data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the execution circuitry is to determine the first number and the second number in accordance with the split value associated with the broadcast instruction.
-
公开(公告)号:US20180088946A1
公开(公告)日:2018-03-29
申请号:US15277963
申请日:2016-09-27
Applicant: Intel Corporation
IPC: G06F9/30
CPC classification number: G06F9/30018 , G06F9/3001 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30185 , G06F9/30192
Abstract: Systems, methods, and apparatuses relating to mixing vector operations are described. In one embodiment, a processor includes a decoder to decode an instruction; and an execution unit to execute the decoded instruction to: receive a first input operand of a first data vector, a second input operand of a second data vector, and a third input operand of a control value vector, perform a first operation on data in a same element position of the first and second data vectors for each same element position of the control value vector having a first control value, perform a second, different operation on data in a same element position of the first and second data vectors for each same element position of the control value vector having a second, different control value, and output results from each first operation and each second operation into each corresponding element position in an output vector.
-
-