RAY TRACING APPARATUS AND METHOD FOR MEMORY ACCESS AND REGISTER OPERATIONS

    公开(公告)号:US20210150800A1

    公开(公告)日:2021-05-20

    申请号:US17108774

    申请日:2020-12-01

    申请人: Intel Corporation

    摘要: An apparatus and method for performing BVH compression and decompression concurrently with stores and loads, respectively. For example, one embodiment comprises: bounding volume hierarchy (BVH) construction circuitry to build a BVH based on a set of input primitives, the BVH comprising a plurality of uncompressed coordinates; traversal/intersection circuitry to traverse one or more rays through the BVH and determine intersections with the set of input primitives using the uncompressed coordinates; store with compression circuitry to compress the BVH including the plurality of uncompressed coordinates to generate a compressed BVH with compressed coordinates and to store the compressed BVH to a memory subsystem; and load with decompression circuitry to decompress the BVH including the compressed coordinates to generate a decompressed BVH with the uncompressed coordinates and to load the decompressed BVH with uncompressed coordinates to a cache and/or a set of registers accessible by the traversal/intersection circuitry.

    COLLAPSING OF MULTIPLE NESTED LOOPS, METHODS, AND INSTRUCTIONS

    公开(公告)号:US20180373538A1

    公开(公告)日:2018-12-27

    申请号:US16120983

    申请日:2018-09-04

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/32

    摘要: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.

    APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS

    公开(公告)号:US20170300332A1

    公开(公告)日:2017-10-19

    申请号:US15476356

    申请日:2017-03-31

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

    EFFICIENT ZERO-BASED DECOMPRESSION
    5.
    发明申请

    公开(公告)号:US20170300326A1

    公开(公告)日:2017-10-19

    申请号:US15438712

    申请日:2017-02-21

    申请人: Intel Corporation

    IPC分类号: G06F9/30 H03M7/46

    摘要: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.

    INSTRUCTION AND LOGIC FOR PARTIAL REDUCTION OPERATIONS

    公开(公告)号:US20170168819A1

    公开(公告)日:2017-06-15

    申请号:US14968990

    申请日:2015-12-15

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: In one embodiment, a processor includes: a fetch logic to fetch instructions, the instructions including a partial reduction instruction; a decode logic to decode the partial reduction instruction and provide the decoded partial reduction instruction to one or more execution units; and the one or more execution units to, responsive to the decoded partial reduction instruction, perform a plurality of N partial reduction operations to generate an result array including N output data elements, where an input array comprises N lanes, and where each of the N partial reduction operations is to reduce a set of input data elements included in a corresponding lane of the N lanes. Other embodiments are described and claimed.

    MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS
    8.
    发明申请
    MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS 审中-公开
    具有不同读取和写入掩码的多元素指令

    公开(公告)号:US20170052783A1

    公开(公告)日:2017-02-23

    申请号:US15346531

    申请日:2016-11-08

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.

    摘要翻译: 描述了一种包括从第一寄存器读取第一读取掩码的方法。 该方法还包括从第二寄存器或存储器位置读取第一向量操作数。 该方法还包括对第一向量操作数应用读取掩码以产生用于操作的一组元素。 该方法还包括执行设定元件的操作。 该方法还包括通过产生操作结果的多个实例来创建输出向量。 该方法还包括从第三寄存器读取第一写掩码,第一写掩码不同于第一读掩码。 该方法还包括针对输出向量应用写掩码以产生合成矢量。 该方法还包括将结果矢量写入目的地寄存器。

    HAND HELD DEVICE TO PERFORM A BIT RANGE ISOLATION INSTRUCTION
    9.
    发明申请
    HAND HELD DEVICE TO PERFORM A BIT RANGE ISOLATION INSTRUCTION 审中-公开
    手持式设备执行双向隔离指令

    公开(公告)号:US20150143084A1

    公开(公告)日:2015-05-21

    申请号:US14568812

    申请日:2014-12-12

    申请人: INTEL CORPORATION

    IPC分类号: G06F9/30 G06F9/38

    摘要: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.

    摘要翻译: 接收指示源操作数和目标操作数的指令。 将结果存储在目标操作数中以响应指令。 结果操作数可以具有:(1)具有第一端的第一范围,其中每个位在相应位置中的每个位与源操作数的位相同的指令明确地指定; 和(2)与相应位置中的源操作数的位的值无关的所有位都具有相同值的第二范围。 不管移动第一范围的结果相对于源操作数的相应位置中相同值的位,执行指令都可以完成,而不考虑结果中第一个位的位置。 还公开了执行这些指令的执行单元,具有执行这种指令的处理器的计算机系统以及存储这种指令的机器可读介质。

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR ALIGNING TILES OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20220206854A1

    公开(公告)日:2022-06-30

    申请号:US17134142

    申请日:2020-12-24

    申请人: Intel Corporation

    摘要: Systems, methods, and apparatuses relating to one or more instructions for element aligning of a tile of a matrix operations accelerator are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a first plurality of registers that represents a first two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a second plurality of registers that represents a second two-dimensional matrix coupled to the two-dimensional grid of processing elements; and a hardware processor core coupled to the matrix operations accelerator circuit and comprising a decoder circuit to decode a single instruction into a decoded instruction, the single instruction including a first field that identifies the first two-dimensional matrix, a second field that identifies the second two-dimensional matrix, and an opcode that indicates an execution circuit of the hardware processor core is to cause the matrix operations accelerator circuit to generate a third two-dimensional matrix from a proper subset of elements of a row or a column of the first two-dimensional matrix and a proper subset of elements of a row or a column of the second two-dimensional matrix and store the third two-dimensional matrix at a destination in the matrix operations accelerator circuit, and the execution circuit of the hardware processor core to execute the decoded instruction according to the opcode.