-
公开(公告)号:US20210349848A1
公开(公告)日:2021-11-11
申请号:US17321885
申请日:2021-05-17
申请人: Intel Corporation
发明人: JOYDEEP RAY , ARAVINDH ANANTARAMAN , ABHISHEK R. APPU , ALTUG KOKER , ELMOUSTAPHA OULD-AHMED-VALL , VALENTIN ANDREI , SUBRAMANIAM MAIYURAN , NICOLAS GALOPPO VON BORRIES , VARGHESE GEORGE , MIKE MACPHERSON , BEN ASHBAUGH , MURALI RAMADOSS , VIKRANTH VEMULAPALLI , WILLIAM SADLER , JONATHAN PEARCE , SUNGYE KIM
摘要: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20210150800A1
公开(公告)日:2021-05-20
申请号:US17108774
申请日:2020-12-01
申请人: Intel Corporation
摘要: An apparatus and method for performing BVH compression and decompression concurrently with stores and loads, respectively. For example, one embodiment comprises: bounding volume hierarchy (BVH) construction circuitry to build a BVH based on a set of input primitives, the BVH comprising a plurality of uncompressed coordinates; traversal/intersection circuitry to traverse one or more rays through the BVH and determine intersections with the set of input primitives using the uncompressed coordinates; store with compression circuitry to compress the BVH including the plurality of uncompressed coordinates to generate a compressed BVH with compressed coordinates and to store the compressed BVH to a memory subsystem; and load with decompression circuitry to decompress the BVH including the compressed coordinates to generate a decompressed BVH with the uncompressed coordinates and to load the decompressed BVH with uncompressed coordinates to a cache and/or a set of registers accessible by the traversal/intersection circuitry.
-
公开(公告)号:US20180373538A1
公开(公告)日:2018-12-27
申请号:US16120983
申请日:2018-09-04
申请人: Intel Corporation
摘要: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.
-
公开(公告)号:US20170300332A1
公开(公告)日:2017-10-19
申请号:US15476356
申请日:2017-03-31
申请人: Intel Corporation
发明人: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL SAN ADRIAN , BRET L. TOLL , MARK J. CHARNEY , ZEEV SPERBER , AMIT GRADSTEIN
IPC分类号: G06F9/30
CPC分类号: G06F9/30181 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F9/30167 , G06F9/3802 , G06F12/0615
摘要: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.
-
公开(公告)号:US20170300326A1
公开(公告)日:2017-10-19
申请号:US15438712
申请日:2017-02-21
申请人: Intel Corporation
发明人: ELMOUSTAPHA OULD-AHMED-VALL , SULEYMAN SAIR , KSHITIJ A. DOSHI , CHARLES R. YOUNT , BRET L. TOLL
CPC分类号: G06F9/30018 , G06F9/30036 , H03M7/46
摘要: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
-
公开(公告)号:US20170177340A1
公开(公告)日:2017-06-22
申请号:US14977782
申请日:2015-12-22
申请人: Intel Corporation
发明人: ASHISH JHA , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK J. CHARNEY , MILIND B. GIRKAR
IPC分类号: G06F9/30
CPC分类号: G06F9/30036 , G06F9/30043 , G06F9/30109 , G06F9/3455
摘要: A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.
-
公开(公告)号:US20170168819A1
公开(公告)日:2017-06-15
申请号:US14968990
申请日:2015-12-15
申请人: Intel Corporation
CPC分类号: G06F9/3016 , G06F9/3001 , G06F9/30036 , G06F9/3802 , G06F9/3887 , G06F9/3893
摘要: In one embodiment, a processor includes: a fetch logic to fetch instructions, the instructions including a partial reduction instruction; a decode logic to decode the partial reduction instruction and provide the decoded partial reduction instruction to one or more execution units; and the one or more execution units to, responsive to the decoded partial reduction instruction, perform a plurality of N partial reduction operations to generate an result array including N output data elements, where an input array comprises N lanes, and where each of the N partial reduction operations is to reduce a set of input data elements included in a corresponding lane of the N lanes. Other embodiments are described and claimed.
-
公开(公告)号:US20170052783A1
公开(公告)日:2017-02-23
申请号:US15346531
申请日:2016-11-08
申请人: Intel Corporation
发明人: MIKHAIL PLOTNIKOV , ANDREY NARAIKIN , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , BRET L. TOLL , JESUS CORBAL
IPC分类号: G06F9/30
CPC分类号: G06F9/3013 , G06F7/764 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30029 , G06F9/30036
摘要: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.
摘要翻译: 描述了一种包括从第一寄存器读取第一读取掩码的方法。 该方法还包括从第二寄存器或存储器位置读取第一向量操作数。 该方法还包括对第一向量操作数应用读取掩码以产生用于操作的一组元素。 该方法还包括执行设定元件的操作。 该方法还包括通过产生操作结果的多个实例来创建输出向量。 该方法还包括从第三寄存器读取第一写掩码,第一写掩码不同于第一读掩码。 该方法还包括针对输出向量应用写掩码以产生合成矢量。 该方法还包括将结果矢量写入目的地寄存器。
-
公开(公告)号:US20150143084A1
公开(公告)日:2015-05-21
申请号:US14568812
申请日:2014-12-12
申请人: INTEL CORPORATION
发明人: Maxim Loktyukhin , Eric W. Mahurin , Bret L. Toll , Martin G. Dixon , Sean P. Mirkes , David L. Kreitzer , ELMOUSTAPHA OULD-AHMED-VALL , Vinodh Gopal
CPC分类号: G06F9/30145 , G06F9/30018 , G06F9/30029 , G06F9/30094 , G06F9/3802
摘要: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
摘要翻译: 接收指示源操作数和目标操作数的指令。 将结果存储在目标操作数中以响应指令。 结果操作数可以具有:(1)具有第一端的第一范围,其中每个位在相应位置中的每个位与源操作数的位相同的指令明确地指定; 和(2)与相应位置中的源操作数的位的值无关的所有位都具有相同值的第二范围。 不管移动第一范围的结果相对于源操作数的相应位置中相同值的位,执行指令都可以完成,而不考虑结果中第一个位的位置。 还公开了执行这些指令的执行单元,具有执行这种指令的处理器的计算机系统以及存储这种指令的机器可读介质。
-
10.
公开(公告)号:US20220206854A1
公开(公告)日:2022-06-30
申请号:US17134142
申请日:2020-12-24
申请人: Intel Corporation
摘要: Systems, methods, and apparatuses relating to one or more instructions for element aligning of a tile of a matrix operations accelerator are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a first plurality of registers that represents a first two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a second plurality of registers that represents a second two-dimensional matrix coupled to the two-dimensional grid of processing elements; and a hardware processor core coupled to the matrix operations accelerator circuit and comprising a decoder circuit to decode a single instruction into a decoded instruction, the single instruction including a first field that identifies the first two-dimensional matrix, a second field that identifies the second two-dimensional matrix, and an opcode that indicates an execution circuit of the hardware processor core is to cause the matrix operations accelerator circuit to generate a third two-dimensional matrix from a proper subset of elements of a row or a column of the first two-dimensional matrix and a proper subset of elements of a row or a column of the second two-dimensional matrix and store the third two-dimensional matrix at a destination in the matrix operations accelerator circuit, and the execution circuit of the hardware processor core to execute the decoded instruction according to the opcode.
-
-
-
-
-
-
-
-
-