METHOD AND APPARATUS FOR DATA-READY MEMORY OPERATIONS

    公开(公告)号:WO2019005169A1

    公开(公告)日:2019-01-03

    申请号:PCT/US2017/040512

    申请日:2017-06-30

    Abstract: Disclosed embodiments relate to a new instruction for performing data-ready memory access operations. In one example, a system includes circuits to fetch, decode, and execute an instruction that includes an opcode, at least one memory location identifier identifying at least one data element, a register identifier, a data readiness indicator identifying at least one data access condition, and a data readiness mask, wherein the execution circuit is to, for each data element of the at least one data element, determine whether a memory request for the data element satisfies the at least one data access condition identified by the data readiness indicator, and in response to determining that the memory request for the data element does not satisfy the at least one data access condition: generate a prefetch request for the data element, and set a value in a corresponding data element position of the data readiness mask to indicate that the memory request for the data element does not satisfy the at least one data access condition.

    METHOD AND APPARATUS FOR VECTORIZING HISTOGRAM LOOPS

    公开(公告)号:WO2019005166A1

    公开(公告)日:2019-01-03

    申请号:PCT/US2017/040509

    申请日:2017-06-30

    Abstract: Disclosed embodiments relate to a new instruction for detecting conflicts in a set of vector elements and determining a number of instances of each distinct data value within the vector. A system includes circuits to fetch, decode, and execute an instruction that includes an opcode, a destination vector identifier, a source vector identifier, and an immediate value, wherein the execution circuit is to, for each data element position of a source vector, determine a number of matching data element positions in the source vector storing a same data value as stored at the data element position, the matching data element positions located between the data element position and a least significant data element position of the source vector, and store in a corresponding data element position of a destination vector identified by the destination vector identifier, a value representing the number of matching data element positions.

    METHOD AND APPARATUS FOR CONVERTING SCATTER CONTROL ELEMENTS TO GATHER CONTROL ELEMENTS USED TO SORT VECTOR DATA ELEMENTS

    公开(公告)号:WO2018182445A1

    公开(公告)日:2018-10-04

    申请号:PCT/RU2017/000195

    申请日:2017-03-31

    Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.

    PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO STORE CONSECUTIVE SOURCE ELEMENTS TO UNMASKED RESULT ELEMENTS WITH PROPAGATION TO MASKED RESULT ELEMENTS
    5.
    发明申请
    PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO STORE CONSECUTIVE SOURCE ELEMENTS TO UNMASKED RESULT ELEMENTS WITH PROPAGATION TO MASKED RESULT ELEMENTS 审中-公开
    处理器,方法,系统和说明书,以便将消除结果的成分元素存储在隐藏结果元素中

    公开(公告)号:WO2015145190A1

    公开(公告)日:2015-10-01

    申请号:PCT/IB2014/000611

    申请日:2014-03-27

    CPC classification number: G06F9/30018 G06F9/30032 G06F9/30036 G06F9/3016

    Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.

    Abstract translation: 一方面的处理器包括解码单元,用于对指示包括至少四个数据元素的第一源打包数据操作数的指令,包括至少四个掩码元素的源掩码和目的地存储位置进行解码。 执行单元响应于该指令,存储具有一系列至少两个未屏蔽结果数据元素的结果打包数据操作数。 每个未屏蔽的结果数据元素以相关顺序存储第一源打包数据操作数的至少两个连续数据元素中不同的一个的值。 位于最近对应的未屏蔽结果数据元素对之间的所有掩蔽结果元素具有与对应的对的未屏蔽的结果数据元素相同的值,其最接近结果打包数据操作数的第一端。 掩蔽的结果数据元素对应于被掩蔽的掩模元件。

    VECTORIZATION OF COLLAPSED MULTI-NESTED LOOPS
    6.
    发明申请
    VECTORIZATION OF COLLAPSED MULTI-NESTED LOOPS 审中-公开
    收缩的多针鞋的展开

    公开(公告)号:WO2014105208A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/048794

    申请日:2013-06-29

    Abstract: In an embodiment a method of vectorizing a collapsed multi-nested loop includes executing, in a vector unit of a processor, the collapsed loop to obtain a vector of offsets, including for each of a plurality of iterations, calculating a scalar offset into a multi-dimensional data structure, storing the scalar offset in a data element of a first vector register, and updating a loop counter value of a multi-dimensional loop counter vector. In turn, a plurality of data elements are loaded from the multi-dimensional data structure using a base value and indexes from the vector of offsets, at least one computation is performed on the loaded plurality of data elements to obtain a plurality of results, and the plurality of results are stored into the multi-dimensional data structure using the base value and the indexes from the vector of offsets. Other embodiments are described and claimed.

    Abstract translation: 在一个实施例中,向量化折叠多嵌套循环的方法包括在处理器的向量单元中执行折叠循环以获得偏移向量,包括对于多个迭代中的每一个,将标量偏移计算为多 将标量偏移存储在第一向量寄存器的数据元素中,以及更新多维循环计数器向量的循环计数器值。 接着,使用基本值从多维数据结构中加载多个数据元素,并从偏移矢量进行索引,对被加载的多个数据元素进行至少一次计算以获得多个结果,以及 使用基本值和来自偏移矢量的索引将多个结果存储到多维数据结构中。 描述和要求保护其他实施例。

    READ AND WRITE MASKS UPDATE INSTRUCTION FOR VECTORIZATION OF RECURSIVE COMPUTATIONS OVER INDEPENDENT DATA
    7.
    发明申请
    READ AND WRITE MASKS UPDATE INSTRUCTION FOR VECTORIZATION OF RECURSIVE COMPUTATIONS OVER INDEPENDENT DATA 审中-公开
    读取和写入掩码更新指令,用于独立计算的重新计算

    公开(公告)号:WO2014051737A1

    公开(公告)日:2014-04-03

    申请号:PCT/US2013/045505

    申请日:2013-06-12

    CPC classification number: G06F9/30036 G06F9/30018 G06F9/30032 G06F9/3013

    Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.

    Abstract translation: 处理器执行掩码更新指令以对第一屏蔽寄存器和第二掩码寄存器执行更新。 处理器内的寄存器文件包括第一掩码寄存器和第二掩码寄存器。 处理器包括执行掩膜更新指令的执行电路。 响应于掩码更新指令,执行电路将反转第一掩码寄存器中给定数量的掩码位,并且还反转第二掩码寄存器中给定数量的掩码位。

    TECHNOLOGIES FOR MANAGING DATA WAIT BARRIER OPERATIONS

    公开(公告)号:WO2020201789A1

    公开(公告)日:2020-10-08

    申请号:PCT/IB2019/000362

    申请日:2019-03-29

    Abstract: Technologies for managing data wait barrier operations include starting a receive operation associated with a receive buffer of a compute node that includes a plurality of chunks of data received from a sender compute node. Each of the plurality of chunks of data may be received in an out-of-order sequence relative to an order in which they were transmitted from the sender compute node. The compute node may determine whether a chunk of data in the receive buffer satisfies a condition to be met prior to performing one or more data wait barrier operations to be performed by the compute node to process the chunk of data and, if so, perform a partial computation over the chunk of data.

    STRIDESHIFT INSTRUCTION FOR TRANSPOSING BITS INSIDE VECTOR REGISTER

    公开(公告)号:WO2018158603A1

    公开(公告)日:2018-09-07

    申请号:PCT/IB2017/000333

    申请日:2017-02-28

    Abstract: A processor includes a decode circuit to decode an instruction into a decoded instruction and an execution circuit to execute the decoded instruction to access a first bit of a first input vector located at a bit position indicated by an element of a second input vector, stride over bits of the first input vector using a stride to access bits of the first input vector that are located at a strided bit position with respect to the first bit of the first input vector, and store the first bit of the first input vector and the bits of the first input vector that are located at a strided bit position with respect to the first bit of the first input vector as consecutive bits in a destination vector.

    SYSTEMS, APPARATUSES, AND METHODS FOR STRIDED LOAD
    10.
    发明申请
    SYSTEMS, APPARATUSES, AND METHODS FOR STRIDED LOAD 审中-公开
    用于条形负载的系统,装置和方法

    公开(公告)号:WO2018009319A1

    公开(公告)日:2018-01-11

    申请号:PCT/US2017/037553

    申请日:2017-06-14

    Abstract: Systems, methods, and apparatuses for strided loads are described. In an embodiment, an instruction to include at least an opcode, a field for at least two packed data source operands, a field for a packed data destination operand, and an immediate is designated as a strided load instruction. This instruction is executed to load packed data elements from the at least two packed data source operands using a stride and storing results of the strided loads in the packed data destination operand starting from a defined position determined in part from the immediate.

    Abstract translation: 描述了用于跨越载荷的系统,方法和装置。 在一个实施例中,包括至少一个操作码,至少两个打包数据源操作数的字段,打包数据目的地操作数的字段以及立即数的指令被指定为分步加载指令。 执行该指令以从步骤中加载来自至少两个打包数据源操作数的打包数据元素,并且将分步加载的结果存储在打包数据目的地操作数中从部分地从立即部分确定的定义的位置开始存储。

Patent Agency Ranking