专利检索 ap:("Zeev Sperber" OR "Robert Valentine" OR "Yuval Bustan" OR "Rafi Marom") AND inv:"Robert Valentine" 第 9 页

81.

发明申请
METHOD AND APPARATUS FOR EFFICIENT MATRIX ALIGNMENT IN A SYSTOLIC ARRAY 审中-公开

公开(公告)号：US20190042262A1

公开(公告)日：2019-02-07

申请号：US16147506

申请日：2018-09-28

申请人： Michael Espig , Bret Toll , Raanan Sade , Robert Valentine , Alexander Heinecke

发明人： Michael Espig , Bret Toll , Raanan Sade , Robert Valentine , Alexander Heinecke

IPC分类号： G06F9/38 , G06F15/80 , G06F9/30

摘要： An apparatus and method for efficient matrix alignment in a systolic array. For example, one embodiment of a processor comprises: a first set of physical tile registers to store first matrix data in rows or columns; a second set of physical tile registers to store second matrix data in rows or columns; a decoder to decode a matrix instruction identifying a first input matrix, a first offset, a second input matrix, and a second offset; and execution circuitry, responsive to the matrix instruction, to read a subset of rows or columns from the first set of physical tile registers in accordance with the first offset, spanning multiple physical tile registers from the first set if indicated by the first offset to generate a first input matrix and the execution circuitry to read a subset of rows or columns from the second set of physical tile registers in accordance with the second offset, spanning multiple physical tile registers from the second set if indicated by the second offset to generate a second input matrix; and the execution circuitry to perform an arithmetic operation with the first and second input matrices in accordance with an opcode of the matrix instruction.

82.

发明授权
Instruction and logic to provide vector blend and permute functionality 有权

公开(公告)号：US10037205B2

公开(公告)日：2018-07-31

申请号：US13977734

申请日：2011-12-23

申请人： Robert Valentine , Bret L. Toll , Jesus Corbal , Jeffrey G. Wiedemeier , Sridhar Samudrala

发明人： Robert Valentine , Bret L. Toll , Jesus Corbal , Jeffrey G. Wiedemeier , Sridhar Samudrala

IPC分类号： G06F15/00 , G06F15/76 , G06F9/30 , G06F9/38

CPC分类号： G06F9/30036 , G06F9/3001 , G06F9/30018 , G06F9/30032 , G06F9/3887

摘要： Vector blend and permute functionality are provided, responsive to instructions specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, a second vector register, and a third operand. Indices are read from fields in the second register. Each index has a first selector portion and a second selector portion. Corresponding unmasked vector elements are stored to fields of the destination register, wherein each vector element, responsive to the respective first selector portion having a first value, is copied to an intermediate vector from a corresponding data field of the first register, and responsive to the respective first selector portion having a second value, is copied to the intermediate vector from a corresponding data field of the third operand. Then unmasked data fields of the destination are replaced by data fields in the intermediate vector indexed by the corresponding second selector portions.

83.

发明申请
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY 审中-公开

公开(公告)号：US20180196672A1

公开(公告)日：2018-07-12

申请号：US15912498

申请日：2018-03-05

申请人： Christopher J. Hughes , Mikhail Plotnikov , Andrey Naraikin , Robert Valentine

发明人： Christopher J. Hughes , Mikhail Plotnikov , Andrey Naraikin , Robert Valentine

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/30145 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3834

摘要： Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

84.

发明授权
Systems, apparatuses, and methods for data speculation execution 有权

公开(公告)号：US09785442B2

公开(公告)日：2017-10-10

申请号：US14582897

申请日：2014-12-24

申请人： Elmoustapha Ould-Ahmed-Vall , Christopher J. Hughes , Robert Valentine , Milind B. Girkar

发明人： Elmoustapha Ould-Ahmed-Vall , Christopher J. Hughes , Robert Valentine , Milind B. Girkar

IPC分类号： G06F9/30 , G06F9/34 , G06F9/46

CPC分类号： G06F9/3016 , G06F9/30043 , G06F9/30087 , G06F9/30098 , G06F9/34 , G06F9/3455 , G06F9/3834 , G06F9/3842 , G06F9/3855 , G06F9/3859 , G06F9/3861 , G06F9/467

摘要： Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.

85.

发明申请
Systems, Apparatuses, and Methods for Aggregate Gather and Stride 审中-公开

公开(公告)号：US20170192782A1

公开(公告)日：2017-07-06

申请号：US14984132

申请日：2015-12-30

申请人： Robert Valentine , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Ashish Jha

发明人： Robert Valentine , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Ashish Jha

IPC分类号： G06F9/30

CPC分类号： G06F9/3016 , G06F9/30043 , G06F9/30098 , G06F9/30109 , G06F9/30112

摘要： Embodiments of systems, apparatuses, and methods for aggregate gather and scatter are disclosed. In some embodiments, a decoder to decode an instruction, wherein the instruction to include fields for an index of memory address locations, an immediate, and a starting destination register operand and identifier of additional destination registers; and execution circuitry to execute the decoded instruction to gather, from memory at locations indicated by the index of memory locations, data elements and stores them in multiple destination registers in sizes dictated by the immediate are described.

86.

发明授权
Compressed instruction format 有权
标题翻译：压缩指令格式

公开(公告)号：US09569208B2

公开(公告)日：2017-02-14

申请号：US14307468

申请日：2014-06-17

申请人： Robert Valentine , Doron Orenstein , Brett L. Toll

发明人： Robert Valentine , Doron Orenstein , Brett L. Toll

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/30 , G06F9/30145 , G06F9/30149 , G06F9/3017 , G06F9/30174 , G06F9/30178 , G06F9/30185 , G06F9/3816 , G06F9/382

摘要： A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.

摘要翻译： 一种解码可变长度指令集中的指令的技术。在一个实施例中，描述了指令编码，其中支持遗留，现在和将来的指令集扩展，并且提供增加的功能，而不扩展代码大小，并且在一些情况下减少代码大小。

87.

发明申请
EFFICIENT INSTRUCTION FUSION BY FUSING INSTRUCTIONS THAT FALL WITHIN A COUNTER-TRACKED AMOUNT OF CYCLES APART 审中-公开
标题翻译：通过在相当数量的循环中倒装的说明书进行有效的指导性融合

公开(公告)号：US20170003965A1

公开(公告)日：2017-01-05

申请号：US15143520

申请日：2016-04-30

申请人： Ido Ouziel , Lihu Rappoport , Robert Valentine , Ron Gabor , Pankaj Raghuvanshi

发明人： Ido Ouziel , Lihu Rappoport , Robert Valentine , Ron Gabor , Pankaj Raghuvanshi

IPC分类号： G06F9/30 , G06F12/0875

CPC分类号： G06F9/3853 , G06F9/3016 , G06F9/3017 , G06F9/30196 , G06F9/3836 , G06F12/084 , G06F12/0875 , G06F13/4063 , G06F2212/452 , G06F2212/62 , Y02D10/14 , Y02D10/151

摘要： A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction.

摘要翻译： 一种在计算机系统内实现高效指令融合的技术。在一个实施例中，如果指令队列中的第一指令与第二指令可熔，则处理器逻辑延迟第二指令的处理阈值时间量。

88.

发明授权
Floating point round-off amount determination processors, methods, systems, and instructions 有权
标题翻译：浮点数四舍五入确定处理器，方法，系统和说明

公开(公告)号：US09513871B2

公开(公告)日：2016-12-06

申请号：US13977257

申请日：2011-12-30

申请人： Cristina S. Anderson , Bret L. Toll , Robert Valentine , Simon Rubanovich , Amit Gradstein

发明人： Cristina S. Anderson , Bret L. Toll , Robert Valentine , Simon Rubanovich , Amit Gradstein

IPC分类号： G06F7/483 , G06F9/30 , G06F7/499

CPC分类号： G06F9/3001 , G06F7/483 , G06F7/49947 , G06F9/30014 , G06F9/30025 , G06F9/30036 , G06F9/30109 , G06F9/3013

摘要： A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.

摘要翻译： 一种方面的方法包括接收浮点舍入量确定指令。该指令指示一个或多个浮点数据元素的源，指示小数点之后的小数位数，并指示目的地存储位置。包括一个或多个结果浮点数据元素的结果响应于浮点舍入量确定指令被存储在目的地存储位置中。一个或多个结果浮点数据元素中的每一个包括相应位置的源的相应浮点数据元素与已被舍入到指示的源的相应浮点数据元素的舍入版本之间的差小数位数。公开了其它方法，装置，系统和指令。

89.

发明授权
Multi-element instruction with different read and write masks 有权
标题翻译：具有不同读写掩码的多元素指令

公开(公告)号：US09489196B2

公开(公告)日：2016-11-08

申请号：US13997998

申请日：2011-12-23

申请人： Mikhail Plotnikov , Andrey Naraikan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Bret L. Toll , Jesus Corbal

发明人： Mikhail Plotnikov , Andrey Naraikan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Bret L. Toll , Jesus Corbal

IPC分类号： G06F7/76 , G06F9/30

CPC分类号： G06F9/3013 , G06F7/764 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30029 , G06F9/30036

摘要： A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.

摘要翻译： 描述了一种包括从第一寄存器读取第一读取掩码的方法。该方法还包括从第二寄存器或存储器位置读取第一向量操作数。该方法还包括对第一向量操作数应用读取掩码以产生用于操作的一组元素。该方法还包括执行设定元件的操作。该方法还包括通过产生操作结果的多个实例来创建输出向量。该方法还包括从第三寄存器读取第一写掩码，第一写掩码不同于第一读掩码。该方法还包括针对输出向量应用写掩码以产生合成矢量。该方法还包括将结果矢量写入目的地寄存器。

90.

发明申请
EFFICIENT INSTRUCTION FUSION BY FUSING INSTRUCTIONS THAT FALL WITHIN A COUNTER-TRACKED AMOUNT OF CYCLES APART 审中-公开
标题翻译：通过在相当数量的循环中倒装的说明书进行有效的指导性融合

公开(公告)号：US20160246600A1

公开(公告)日：2016-08-25

申请号：US15143522

申请日：2016-04-30

申请人： Ido Ouziel , Lihu Rappoport , Robert Valentine , Ron Gabor , Pankaj Raghuvanshi

发明人： Ido Ouziel , Lihu Rappoport , Robert Valentine , Ron Gabor , Pankaj Raghuvanshi

IPC分类号： G06F9/30 , G06F13/40 , G06F12/08

CPC分类号： G06F9/3853 , G06F9/3016 , G06F9/3017 , G06F9/30196 , G06F9/3836 , G06F12/084 , G06F12/0875 , G06F13/4063 , G06F2212/452 , G06F2212/62 , Y02D10/14 , Y02D10/151

摘要： A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction.

摘要翻译： 一种在计算机系统内实现高效指令融合的技术。在一个实施例中，如果指令队列中的第一指令与第二指令可熔，则处理器逻辑延迟第二指令的处理阈值时间量。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类