METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT SHUFFLE
    32.
    发明申请
    METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT SHUFFLE 审中-公开
    用于执行矢量位块的方法和装置

    公开(公告)号:US20160188532A1

    公开(公告)日:2016-06-30

    申请号:US14583636

    申请日:2014-12-27

    申请人: INTEL CORPORATION

    IPC分类号: G06F15/80 G06F9/30

    摘要: An apparatus and method for performing a vector bit shuffle. For example, one embodiment of a processor comprises: a first vector register to store a plurality of source data elements; a second vector register to store a plurality of control elements, each of the control elements comprising a plurality of bit fields, each bit field to be associated with a corresponding bit position in a destination mask register and to identify a bit from each of the source data elements to be copied to each of the particular bit positions; and vector bit shuffle logic to read each bit field from the second vector register to identify a bit from each of the source data elements and to responsively copy the bit from each of the source data elements to each of the corresponding bit positions in the destination mask register.

    摘要翻译: 用于执行向量比特洗牌的装置和方法。 例如,处理器的一个实施例包括:第一向量寄存器,用于存储多个源数据元素; 用于存储多个控制元件的第二矢量寄存器,每个控制元件包括多个位域,每个位域与目的地掩模寄存器中的对应位位置相关联,并且从源中的每一个识别位 要复制到每个特定位位置的数据元素; 和向量位洗牌逻辑,以从第二向量寄存器读取每个位字段,以识别来自每个源数据元素的位,并且响应地将每个源数据元素中的位复制到目标掩码中的每个相应位位置 寄存器。

    METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT GATHER
    33.
    发明申请
    METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT GATHER 审中-公开
    用于执行矢量位加法器的方法和装置

    公开(公告)号:US20160188335A1

    公开(公告)日:2016-06-30

    申请号:US14583639

    申请日:2014-12-27

    申请人: INTEL CORPORATION

    IPC分类号: G06F9/30

    摘要: An apparatus and method for performing a vector bit gather. For example, one embodiment of a processor comprises: a first vector register to store one or more source data elements; a second vector register to store one or more control elements, each of the control elements comprising a plurality of bit fields, each bit field to be associated with a corresponding bit position in a destination vector register and to identify a bit from the one or more source data elements to be copied to each of the particular bit positions; and vector bit gather logic to read each bit field from the second vector register to identify a bit from the one or more source data elements and to responsively copy the bit from each of the one or more source data elements to each of the corresponding bit positions in the destination vector register.

    摘要翻译: 用于执行向量位聚合的装置和方法。 例如,处理器的一个实施例包括:第一向量寄存器,用于存储一个或多个源数据元素; 第二矢量寄存器,用于存储一个或多个控制元件,每个控制元件包括多个位域,每个位字段将与目的地向量寄存器中的相应位位置相关联,并且从一个或多个位 要复制到每个特定位位置的源数据元素; 和向量位采集逻辑,以从第二向量寄存器读取每个位域,以识别来自一个或多个源数据元素的位,并且响应地将该一个或多个源数据元素中的每个源的位复制到相应的位位置 在目的向量寄存器中。

    INSTRUCTION AND LOGIC TO PERFORM AN INVERSE CENTRIFUGE OPERATION
    34.
    发明申请
    INSTRUCTION AND LOGIC TO PERFORM AN INVERSE CENTRIFUGE OPERATION 审中-公开
    指导和逻辑执行反向离散操作

    公开(公告)号:US20160179548A1

    公开(公告)日:2016-06-23

    申请号:US14580055

    申请日:2014-12-22

    申请人: Intel Corporation

    IPC分类号: G06F9/38 G06F9/30

    摘要: In one embodiment a processing device implements a set of instructions to perform an inverse centrifuge operation using vector or general purpose registers. The inverse centrifuge operation interleaves bits from opposite regions of a source and writes the interleaved bits to a destination. The instructions use a control mask where each bit with a mask value of one is obtained from one side of the source register or vector elements with a mask of zero are obtained from the opposing side.

    摘要翻译: 在一个实施例中,处理装置实现一组指令以使用向量或通用寄存器来执行逆离心机操作。 反向离心机操作从源的相对区域交错比特,并将交错比特写入目的地。 指令使用控制掩码,其中从源寄存器的一侧获得具有掩码值为1的每个位或从相对侧获得具有零掩蔽的向量元素。

    METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION
    35.
    发明申请
    METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION 有权
    用于执行冲突检测的方法和装置

    公开(公告)号:US20160179528A1

    公开(公告)日:2016-06-23

    申请号:US14581607

    申请日:2014-12-23

    申请人: INTEL CORPORATION

    IPC分类号: G06F9/30

    摘要: An apparatus and method are described for performing conflict detection operations. For example, one embodiment of a processor comprises: a first source vector register to store a first set of data elements; a second source vector register to store a second set of data elements; conflict detection logic to perform a specified comparison operation comparing each of the first set of data elements with specified data elements from the second set and generating a set of comparison results, the comparison operation to be selected from a group consisting of a greater than comparison, a less than comparison, a greater than or equal to comparison, a less than or equal to comparison, and a not equal to comparison.

    摘要翻译: 描述了用于执行冲突检测操作的装置和方法。 例如,处理器的一个实施例包括:第一源向量寄存器,用于存储第一组数据元素; 第二源向量寄存器,用于存储第二组数据元素; 冲突检测逻辑,用于执行指定的比较操作,将第一组数据元素与来自第二组的指定数据元素进行比较,并生成一组比较结果,从大于比较的组中选择的比较操作, 小于比较,大于或等于比较,小于或等于比较,不等于比较。

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING A TILE OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20220206989A1

    公开(公告)日:2022-06-30

    申请号:US17134129

    申请日:2020-12-24

    申请人: Intel Corporation

    IPC分类号: G06F15/80 G06F9/38 G06F9/30

    摘要: Systems, methods, and apparatuses relating to one or more instructions for loading a tile of a matrix operations accelerator are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a plurality of registers that represents a two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a coupling to a cache; and a hardware processor core coupled to the matrix operations accelerator circuit and comprising a vector register, a decoder circuit to decode a single instruction into a decoded instruction, the single instruction including a first field that identifies the two-dimensional matrix, a second field that identifies a location in the cache, and a third field that identifies the vector register, and an opcode that indicates an execution circuit of the hardware processor core is to load elements into the plurality of registers that represents the two-dimensional matrix from the location in the cache by the coupling to the cache, and load one or more elements from the vector register into the plurality of registers that represents the two-dimensional matrix by a coupling of the hardware processor core to the matrix operations accelerator circuit that is separate from the coupling to the cache, and the execution circuit of the hardware processor core to execute the decoded instruction according to the opcode.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR ALIGNING TILES OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20220206800A1

    公开(公告)日:2022-06-30

    申请号:US17134136

    申请日:2020-12-24

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38 G06F9/50

    摘要: Systems, methods, and apparatuses relating to one or more instructions for row or column aligning of a tile of a matrix operations accelerator are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a first plurality of registers that represents a first two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a second plurality of registers that represents a second two-dimensional matrix coupled to the two-dimensional grid of processing elements; and a hardware processor core coupled to the matrix operations accelerator circuit and comprising a decoder circuit to decode a single instruction into a decoded instruction, the single instruction including a first field that identifies the first two-dimensional matrix, a second field that identifies the second two-dimensional matrix, and an opcode that indicates an execution circuit of the hardware processor core is to cause a third two-dimensional matrix to be logically formed for input into the two-dimensional grid of processing elements from the first two-dimensional matrix and the second two-dimensional matrix without moving data elements within the first plurality of registers and the second plurality of registers, and the execution circuit of the hardware processor core to execute the decoded instruction according to the opcode.

    APPARATUS AND METHOD FOR PERFORMING DUAL SIGNED AND UNSIGNED MULTIPLICATION OF PACKED DATA ELEMENTS

    公开(公告)号:US20210294604A1

    公开(公告)日:2021-09-23

    申请号:US17226986

    申请日:2021-04-09

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply a first doubleword data element from the first source register with a second doubleword data element from the second source register to generate a first quadword product and to concurrently multiply a third doubleword data element from the first source register with a fourth doubleword data element from the second source register to generate a second quadword product; and a destination register to store the first quadword product and the second quadword product as first and second packed quadword data elements.