Processors, methods, systems, and instructions to consolidate unmasked elements of operation masks
    3.
    发明授权
    Processors, methods, systems, and instructions to consolidate unmasked elements of operation masks 有权
    整合操作掩码的未屏蔽元素的处理器,方法,系统和指令

    公开(公告)号:US09411593B2

    公开(公告)日:2016-08-09

    申请号:US13842730

    申请日:2013-03-15

    Inventor: Ashish Jha

    CPC classification number: G06F9/30145 G06F9/30018 G06F9/30036

    Abstract: An instruction processing apparatus of an aspect includes a plurality of operation mask registers. The apparatus also includes a decode unit to receive an operation mask consolidation instruction. The operation mask consolidation instruction is to indicate a source operation mask register, of the plurality of operation mask registers, and a destination storage location. The source operation mask register is to include a source operation mask that is to include a plurality of masked elements that are to be disposed within a plurality of unmasked elements. An execution unit is coupled with the decode unit. The execution unit, in response to the operation mask consolidation instruction, is to store a consolidated operation mask in the destination storage location. The consolidated operation mask is to include the unmasked elements from the source operation mask consolidated together. Other apparatus, methods, systems, and instructions are also disclosed.

    Abstract translation: 一方面的指令处理装置包括多个操作掩码寄存器。 该装置还包括用于接收操作掩码合并指令的解码单元。 操作掩码合并指令是指示多个操作掩码寄存器中的源操作掩码寄存器和目的地存储位置。 源操作屏蔽寄存器包括源操作掩码,其包括要被布置在多个未屏蔽元件内的多个屏蔽元件。 执行单元与解码单元耦合。 执行单元响应于操作掩码合并指令,将合并的操作掩码存储在目的地存储位置中。 合并操作掩码是将来自源操作掩码的未屏蔽元素合并在一起。 还公开了其他装置,方法,系统和指令。

    Architectural register replacement for instructions that use multiple architectural registers

    公开(公告)号:US10255072B2

    公开(公告)日:2019-04-09

    申请号:US15201310

    申请日:2016-07-01

    Abstract: A processor of an aspect includes a decode unit to decode an instruction. The instruction is to explicitly specify a first architectural register and is to implicitly indicate at least a second architectural register. The second architectural register is implicitly to be at a higher register number than the first architectural register. The processor also includes an architectural register replacement unit coupled with the decode unit. The architectural register replacement unit is to replace the first architectural register with a third architectural register, and is to replace the second architectural register with a fourth architectural register. The third architectural register is to be at a lower register number than the first architectural register. The fourth architectural register is to be at a lower register number than the second architectural register. Other processors are also disclosed, as are methods and systems.

    PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO PARTITION A SOURCE PACKED DATA INTO LANES

    公开(公告)号:US20170286109A1

    公开(公告)日:2017-10-05

    申请号:US15087231

    申请日:2016-03-31

    Inventor: Ashish Jha

    Abstract: A processor includes a decode unit to decode an instruction that is to indicate a source packed data that is to include a plurality of adjoining data elements, a number of data elements, and a destination. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination. The result packed data is to have a plurality of lanes that are each to store a different non-overlapping set of the indicated number of adjoining data elements aligned with a least significant end of the respective lane. The different non-overlapping sets of the indicated number of the adjoining data elements in adjoining lanes of the result packed data are to be separated from one another by at least one most significant data element position of the less significant lane.

    AGGREGATE SCATTER INSTRUCTIONS
    7.
    发明申请

    公开(公告)号:US20170177543A1

    公开(公告)日:2017-06-22

    申请号:US14979047

    申请日:2015-12-22

    CPC classification number: G06F15/8007 G06F9/30 G06F9/30098 G06F9/3016

    Abstract: An Aggregate Scatter instruction is described. A processor may include a memory interface and a register to store data elements of a data structure. The data elements may be contiguously stored in a first location in a memory accessible via the memory interface. The processor may further include a decoder to decode an aggregate scatter instruction specifying a store operation for the data structure and an execution unit to contiguously store the data elements to a second storage location in the memory in response to the decoded aggregate scatter instruction. The second storage location may be identified by a starting memory address of the second storage location.

    ADJOINING DATA ELEMENT PAIRWISE SWAP PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

    公开(公告)号:US20170177362A1

    公开(公告)日:2017-06-22

    申请号:US14978736

    申请日:2015-12-22

    Inventor: Ashish Jha

    CPC classification number: G06F9/30036 G06F9/30032 G06F15/8007 G06F15/8053

    Abstract: A processor includes a decode unit to decode an adjoining data element pairwise swap instruction. The instruction is to indicate a source packed data that is to include pairs of adjoining data elements, and is to indicate a destination storage location. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination storage location, the result packed data to include pairs of adjoining data elements. Each pair of adjoining data elements of the result packed data is to correspond to a different pair of adjoining data elements of the source packed data. The adjoining data elements in each pair of the result packed data to have been swapped in position relative to the adjoining data elements in each corresponding pair of the source packed data.

    Processors, methods, systems, and instructions to Partition a source packed data into lanes

    公开(公告)号:US11204764B2

    公开(公告)日:2021-12-21

    申请号:US15087231

    申请日:2016-03-31

    Inventor: Ashish Jha

    Abstract: A processor includes a decode unit to decode an instruction that is to indicate a source packed data that is to include a plurality of adjoining data elements, a number of adjoining data elements, and a destination. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination. The result packed data is to have a plurality of lanes that are each to store a different non-overlapping set of the indicated number of adjoining data elements aligned with a least significant end of the respective lane. The different non-overlapping sets of the indicated number of the adjoining data elements in adjoining lanes of the result packed data are to be separated from one another by at least one most significant data element position of the less significant lane of the adjoining lanes.

    Multi-register gather instruction
    10.
    发明授权

    公开(公告)号:US10180838B2

    公开(公告)日:2019-01-15

    申请号:US15709254

    申请日:2017-09-19

    Inventor: Ashish Jha

    Abstract: A processor fetches a multi-register gather instruction that includes a destination operand that specifies a destination vector register, and a source operand that identifies content that indicates multiple vector registers, a first set of indexes of each of the vector registers that each identifies a source data element, and a second set of indexes of the destination vector register for each identified source element. The instruction is decoded and executed, causing, for each of the first set of indexes of each of the vector registers, the source data element that corresponds to that index of that vector register to be stored in a set of destination data elements that correspond to the second set of identified indexes of the destination vector register for that source data element.

Patent Agency Ranking