专利检索 ap:("Sridhar Samudrala" OR "John D. Clouser" OR "William R. Grundmann") AND inv:"Sridhar Samudrala" 第 1 页

1.

发明授权
Method and apparatus for accumulating partial quotients in a digital processor 有权
标题翻译：用于在数字处理器中累积部分商的方法和装置

公开(公告)号：US06732135B1

公开(公告)日：2004-05-04

申请号：US09494593

申请日：2000-01-31

申请人： Sridhar Samudrala , John D. Clouser , William R. Grundmann

发明人： Sridhar Samudrala , John D. Clouser , William R. Grundmann

IPC分类号： G06F752

CPC分类号： G06F7/535 , G06F2207/5355

摘要： In a digital processor performing division, quotient accumulation apparatus is formed of a set of muxes and a single carry save adder. Partial quotients are accumulated in carry-save form with proper sign extension. Delay of partial quotient bit fragments from one iteration to a following iteration enables the apparatus to limit use to one carry save adder. By enlarging minimal logic, the quotient accumulation apparatus operates at a rate fast enough to support the rate of fast dividers.

摘要翻译： 在执行分割的数字处理器中，商积累装置由一组多路复用器和一个进位存储加法器构成。部分商以携带保存形式累积，具有适当的符号扩展。从一次迭代到后续迭代的部分商位片段的延迟使得装置能够将使用限制到一个进位存储加法器。通过放大最小逻辑，商积累装置以足够快的速度运行以支持快速分频器的速率。

2.

发明授权
Vector logical reduction operation implemented using swizzling on a semiconductor chip 有权
标题翻译：使用在半导体芯片上进行旋转实现的矢量逻辑减少操作

公开(公告)号：US09141386B2

公开(公告)日：2015-09-22

申请号：US12890485

申请日：2010-09-24

申请人： Jeff Wiedemeier , Sridhar Samudrala , Roger Golliver

发明人： Jeff Wiedemeier , Sridhar Samudrala , Roger Golliver

IPC分类号： G06F9/305 , G06F9/30 , G06F15/76 , G06F9/06 , G06F7/00

CPC分类号： G06F9/30029 , G06F7/00 , G06F9/06 , G06F9/30032 , G06F9/30036 , G06F15/76

摘要： A semiconductor processor is described. The semiconductor processor includes logic circuitry to perform a logical reduction instruction. The logic circuitry has swizzle circuitry to swizzle a vector's elements so as to form a swizzle vector. The logic circuitry also has vector logic circuitry to perform a vector logic operation on said vector and said swizzle vector.

摘要翻译： 描述半导体处理器。半导体处理器包括执行逻辑减少指令的逻辑电路。逻辑电路具有旋转矢量元件的旋转电路，以便形成旋转矢量。逻辑电路还具有矢量逻辑电路，用于对所述矢量和所述旋转矢量执行矢量逻辑运算。

3.

发明申请
DOUBLE ROUNDED COMBINED FLOATING-POINT MULTIPLY AND ADD 有权
标题翻译：双重圆形组合浮点数乘法和加法

公开(公告)号：US20140006467A1

公开(公告)日：2014-01-02

申请号：US13539198

申请日：2012-06-29

申请人： Sridhar Samudrala , Grigorios Magklis , Marc Lupon , David R. Ditzel

发明人： Sridhar Samudrala , Grigorios Magklis , Marc Lupon , David R. Ditzel

IPC分类号： G06F7/44 , G06F7/42

CPC分类号： G06F7/4876 , G06F7/483 , G06F7/485 , G06F7/4991 , G06F7/49915 , G06F7/5443 , G06F2207/4802

摘要： Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.

摘要翻译： 公开了提供双向组合浮点乘法和附加功能作为标量或向量SIMD指令或作为融合微操作的方法，装置，指令和逻辑。实施例包括检测浮点（FP）乘法运算和指定作为FP乘法的源操作数结果的后续FP操作。 FP乘法和随后的FP操作被编码为组合FP操作，包括对FP乘法的结果进行舍入，随后是随后的FP操作。所述组合FP操作的编码可以作为可执行线程部分的一部分使用融合乘法硬件来存储和执行，所述融合乘法加法器包括用于FP乘法器的乘积的溢出检测，第一和第二FP加法器来添加第三操作数加法尾数，基于FP乘法器产品中溢出或不溢出的FP乘法器的不同舍入输入的产品。分别使用溢出检测选择最终结果。

4.

发明授权
Computer method and apparatus for division and square root operations using signed digit 有权
标题翻译：使用有符号数字的分割和平方根操作的计算机方法和装置

公开(公告)号：US06564239B2

公开(公告)日：2003-05-13

申请号：US10016902

申请日：2001-12-14

申请人： Mark D. Matson , Robert J. Dupcak , Jonathan D. Krause , Sridhar Samudrala

发明人： Mark D. Matson , Robert J. Dupcak , Jonathan D. Krause , Sridhar Samudrala

IPC分类号： G06F738

CPC分类号： G06F7/535 , G06F7/4824 , G06F7/508 , G06F7/5525 , G06F9/3814 , G06F9/3838 , G06F9/384 , G06F2207/5352

摘要： Computer method and apparatus for performing a square root or division operation generating a root or quotient is presented. A partial remainder is stored in radix-2 or radix-4 signed digit format. A decoder is provided for computing a root or quotient digit, and a correction term dependent on a number of the most significant digits of the partial remainder. An adder is provided for computing the sum of the signed digit partial remainder and the correction term in binary format, and providing the result in signed digit format. The adder computes a carry out independent of a carry in bit and a sum dependent on a Carry_in bit providing a fast adder independent of carry propagate delays. The scaler performs a multiplication by two of the result output from the adder in signed digit format to provide a signed digit next partial remainder.

摘要翻译： 呈现用于执行产生根或商的平方根或除法运算的计算机方法和装置。部分余数以radix-2或radix-4有符号数字格式存储。提供用于计算根数或商数的解码器，以及取决于部分余数的最高有效数字的数量的校正项。提供加法器，用于计算二进制格式的有符号位部分余数和校正项的和，并以带符号数字格式提供结果。加法器计算独立于比特进位的进位和取决于提供独立于进位传播延迟的快速加法器的Carry_in位的和。缩放器执行乘法运算结果从加法器输出的两个符号数字格式，以提供一个有符号数字的下一个部分余数。

5.

发明授权
Mechanism for facilitating dynamic and efficient fusion of computing instructions in software programs 有权
标题翻译：促进软件程序中计算指令的动态和有效融合的机制

公开(公告)号：US09329848B2

公开(公告)日：2016-05-03

申请号：US14129956

申请日：2013-03-27

申请人： Marc Lupon , Raul Martinez , Enric Gibert Codina , Kyriakos A. Stavrou , Grigorios Magklis , Sridhar Samudrala

发明人： Marc Lupon , Raul Martinez , Enric Gibert Codina , Kyriakos A. Stavrou , Grigorios Magklis , Sridhar Samudrala

IPC分类号： G06F9/45

CPC分类号： G06F8/443 , G06F8/4432 , G06F8/4434 , G06F8/4441 , Y02D10/41

摘要： A mechanism is described for facilitating dynamic and efficient fusion of computing instructions according to one embodiment. A method of embodiments, as described herein, includes monitoring a software program for a program region having fusion candidate instructions for a fusion operation at a computing system; evaluating whether the macro operation of the candidate instructions is valuable to the software program; and performing the fusion operation if it is evaluated to be valuable.

摘要翻译： 描述了根据一个实施例的用于促进计算指令的动态和有效融合的机制。如本文所述的实施例的方法包括监视具有用于在计算系统处的融合操作的融合候选指令的程序区域的软件程序; 评估候选指令的宏操作是否对软件程序有价值; 如果评估为有价值，则进行融合操作。

6.

发明授权
Functional unit for vector leading zeroes, vector trailing zeroes, vector operand 1s count and vector parity calculation 有权
标题翻译：向量前导零的功能单位，向量尾随零，向量操作数1s计数和向量奇偶校验计算

公开(公告)号：US09092213B2

公开(公告)日：2015-07-28

申请号：US12890457

申请日：2010-09-24

申请人： Jeff Wiedemeier , Sridhar Samudrala , Roger Golliver , Eric W. Mahurin

发明人： Jeff Wiedemeier , Sridhar Samudrala , Roger Golliver , Eric W. Mahurin

IPC分类号： G06F7/38 , G06F9/00 , G06F9/44 , G06F15/00 , G06F9/30

CPC分类号： G06F9/30036 , G06F9/30014 , G06F9/30018

摘要： A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction.

摘要翻译： 描述了在半导体芯片上执行向量操作的方法。该方法包括利用在半导体芯片上实现的矢量功能单元执行第一矢量指令，并用矢量功能单元执行第二矢量指令。第一个向量指令是一个向量乘法加法指令。第二个向量指令是向量前导零计数指令。

7.

发明申请
VECTOR FRIENDLY INSTRUCTION FORMAT AND EXECUTION THEREOF 审中-公开
标题翻译：向导友好指示格式及其执行

公开(公告)号：US20130305020A1

公开(公告)日：2013-11-14

申请号：US13976707

申请日：2011-09-30

申请人： Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount

发明人： Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount

IPC分类号： G06F9/30

CPC分类号： G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30025 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30149 , G06F9/30181 , G06F9/30185 , G06F9/30192 , G06F9/34

摘要： A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.

摘要翻译： 一种向量友好的指令格式及其执行。根据本发明的一个实施例，处理器被配置为执行指令集。指令集包括向量友好指令格式。向量友好指令格式具有多个字段，包括基本操作字段，修改字段，增加操作字段和数据元素宽度字段，其中第一指令格式支持不同版本的基本操作和不同的扩充操作，基本操作字段，修饰符字段，α字段，β字段和数据元素宽度字段中的不同值，并且其中只有一个不同的值可以被放置在基本操作字段，修饰符字段，在指令流中的第一指令格式的指令的每次出现时的alpha字段，β字段和数据元素宽度字段。

8.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR BLENDING TWO SOURCE OPERANDS INTO A SINGLE DESTINATION USING A WRITEMASK 审中-公开
标题翻译：使用WRITEMASK将两个源操作混合到单个目的地的系统，设备和方法

公开(公告)号：US20120254588A1

公开(公告)日：2012-10-04

申请号：US13078864

申请日：2011-04-01

申请人： Jesus Corbal San Adrian , Bret L. Toll , Robert C. Valentine , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Andrew Thomas Forsyth , Elmoustapha Ould-Ahmed-Vall , Dennis R. Bradford , Lisa K. Wu

发明人： Jesus Corbal San Adrian , Bret L. Toll , Robert C. Valentine , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Andrew Thomas Forsyth , Elmoustapha Ould-Ahmed-Vall , Dennis R. Bradford , Lisa K. Wu

IPC分类号： G06F9/30

CPC分类号： G06F9/30192 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30043

摘要： Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.

摘要翻译： 描述了用于在计算机处理器中执行混合指令的系统，装置和方法的实施例。在一些实施例中，混合指令的执行使用作为第一操作数和第二操作数之间的选择器的写入掩码的相应比特位置，逐个元素地选择第一和第二源操作数的数据元素，并存储所选择的数据元素到达目的地的目标位置。

9.

发明授权
Method and system of a microprocessor subtraction-division floating point divider 失效
标题翻译：微处理器减法分割浮点除法器的方法和系统

公开(公告)号：US07127483B2

公开(公告)日：2006-10-24

申请号：US10036116

申请日：2001-12-26

申请人： Andrew J. Beaumont-Smith , Sridhar Samudrala

发明人： Andrew J. Beaumont-Smith , Sridhar Samudrala

IPC分类号： G06F7/44

CPC分类号： G06F7/535 , G06F7/4873 , G06F7/49936 , G06F7/5375

摘要： The specification discloses a structure of and a method of operating a subtractive division (SD) cell where a portion of the partial remainder or estimated partial remainder directly indicates the next quotient digit. More particularly, by sufficiently constraining the prescaled range for each possible divisor, only a few bits of the partial remainder (the exact number dependent upon the radix), along with their related carries (if any), directly indicate the value of the next quotient digit. Because fewer bits of the partial remainder are needed to make this determination than needed in related art devices, and further because no look-up table or hard-coded decision tree is required, calculation time within each SD cell is shorter than related art devices. Having a shorter calculation time within each SD cell allows for either completion of a greater number of SD cells within each clock cycle, or completion of the calculation to full precision in less time.

摘要翻译： 本说明书公开了一种操作减法（SD）单元的结构和方法，其中部分余数或估计的部分余数的一部分直接指示下一个商数。更具体地说，通过充分约束每个可能的除数的预定范围，只有部分余数的几个位（取决于基数的确切数字）及其相关的运算（如果有的话）直接指示下一个商的值数字。因为需要部分余数的比特少于相关技术装置中所需的这种确定，而且由于不需要查找表或硬编码决策树，所以每个SD小区内的计算时间比现有技术的装置短。在每个SD单元内具有更短的计算时间允许在每个时钟周期内完成更多数量的SD单元，或者在更短的时间内完成计算以达到全精度。

10.

发明授权
Method and apparatus for controlling a rounding operation in a floating point multiplier circuit 失效
标题翻译：用于控制浮点乘法器电路中的舍入操作的方法和装置

公开(公告)号：US5341319A

公开(公告)日：1994-08-23

申请号：US016058

申请日：1993-02-10

申请人： William C. Madden , Vidya Rajagopalan , Sridhar Samudrala

发明人： William C. Madden , Vidya Rajagopalan , Sridhar Samudrala

IPC分类号： G06F7/487 , G06F7/52 , G06F7/38

CPC分类号： G06F7/4876 , G06F7/49952

摘要： A floating point multiply of two n-bit operands creams a 2n-bit result, but ordinarily only n-bit precision is needed, so rounding is performed. Some rounding algorithms require the knowledge of the presence of any "1" in the n-2 low-order bits of the 2n-bit result. The presence of such a "1", indicates the so-called "sticky bit" is set. The sticky bit is calculated in a path separate from the multiply operation, so the n-2 least significant sums need not be calculated. This saves time and circuitry in an array multiplier, for example. In an example method, the difference between n and the number of trailing zeros, "x", in one of the n-bit operands is detected, by transposing the operand and detecting the leading one. The other operand is right-shifted by a number of bit positions equal to this difference. A sticky bit is generated if any logic "1's" are in the low-order n-x-2 bits fight shifted out of the second operand.

摘要翻译： 两个n位操作数的浮点乘法使2n位结果出现，但通常只需要n位精度，因此执行舍入。一些舍入算法需要知道在2n位结果的n-2个低位中存在任何“1”。这样的“1”的存在表示所谓的“粘性位”被设定。在与乘法运算分离的路径中计算粘性位，因此不需要计算n-2个最小有效和。这样可以节省数组乘法器中的时间和电路。在一个示例方法中，通过转置操作数并检测前导码，检测n位操作数之一的n和尾数“x”之间的差异“x”。另一个操作数被右移位等于该差值的多个位位置。如果任何逻辑“1”处于从第二个操作数移出的低阶n-x-2位中，产生一个粘滞位。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类