Instructions and logic to vectorize conditional loops

    公开(公告)号:US09696993B2

    公开(公告)日:2017-07-04

    申请号:US15344836

    申请日:2016-11-07

    申请人: Intel Corporation

    IPC分类号: G06F15/76 G06F9/30 G06F15/80

    摘要: A processing device to provide vectorization of conditional loops includes vector physical registers to store a source vector having a first plurality of n data fields, and a destination vector comprising a second plurality of data fields corresponding to the first plurality of data fields, wherein each of the second plurality of data fields corresponds to a mask value in a vector conditions mask. The processing device includes a decode stage to decode a first processor instruction specifying a vector expand operation and a data partition size, and execution units to set elements of the source vector to n count values, obtain a decisions vector, generate the vector conditions mask according to the decisions vector, and copy data from consecutive vector elements in the source vector, into unmasked vector elements of the destination vector, without copying data from the source vector into masked vector elements of the destination vector.

    Instruction execution that broadcasts and masks data values at different levels of granularity

    公开(公告)号:US11301581B2

    公开(公告)日:2022-04-12

    申请号:US16730844

    申请日:2019-12-30

    申请人: Intel Corporation

    摘要: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.

    Packed data element predication processors, methods, systems, and instructions

    公开(公告)号:US10963257B2

    公开(公告)日:2021-03-30

    申请号:US16586977

    申请日:2019-09-28

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.