专利检索 ap:("Prasoonkumar Surti" OR "Narayan Srinivasa" OR "Feng Chen" OR "Joydeep Ray" OR "Ben J. Ashbaugh" OR "Nicolas C. Galoppo Von Borries" OR "Eriko Nurvitadhi" OR "Balaji Vembu" OR "Tsung-Han Lin" OR "Kamal Sinha" OR "Rajkishore Barik" OR "Sara S. Baghsorkhi" OR "Justin E. Gottschlich" OR "Altug Koker" OR "Nadathur Rajagopalan Satish" OR "Farshad Akhbari" OR "Dukhwan Kim" OR "Wenyin Fu" OR "Travis T. Schluessler" OR "Josh B. Mastronarde" OR "Linda L. Hurd" OR "John H. Feit" OR "Jeffery S. Boles" OR "Adam T. Lake" OR "Karthik Vaidyanathan" OR "Devan Burke" OR "Subramaniam Maiyuran" OR "Abhishek R. Appu") AND inv:"Sara S. Baghsorkhi" 第 1 页

1.

发明申请
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS 审中-公开

公开(公告)号：US20180308200A1

公开(公告)日：2018-10-25

申请号：US15494886

申请日：2017-04-24

申请人： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

发明人： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC分类号： G06T1/20 , G06F17/16 , G06T1/60

CPC分类号： G06T1/20 , G06F8/41 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06F2009/45583 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084

摘要： An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type

2.

发明申请
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS 审中-公开

公开(公告)号：US20180308206A1

公开(公告)日：2018-10-25

申请号：US15698217

申请日：2017-09-07

申请人： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

发明人： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC分类号： G06T1/20 , G06T1/60 , G09G5/36 , G06F3/06 , G06N3/08

CPC分类号： G06T1/20 , G06F3/0613 , G06F3/0659 , G06F3/0679 , G06F3/1438 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/08 , G06N3/084 , G06T1/60 , G09G5/001 , G09G5/363 , G09G2352/00 , G09G2360/06 , G09G2360/08 , G09G2360/121 , G09G2360/123 , G09G2370/08

摘要： An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.

3.

发明申请
FORWARD-LOOKING MACHINE LEARNING FOR DECISION SYSTEMS 审中-公开

公开(公告)号：US20180129970A1

公开(公告)日：2018-05-10

申请号：US15348678

申请日：2016-11-10

申请人： Justin E. Gottschlich , Thijs Metsch , Leonard Truong , Tatiana Shpeisman , Sara S. Baghsorkhi

发明人： Justin E. Gottschlich , Thijs Metsch , Leonard Truong , Tatiana Shpeisman , Sara S. Baghsorkhi

IPC分类号： G06N99/00 , G06N5/04

CPC分类号： G06N3/0427 , G06N3/0454 , G06N3/08

摘要： A machine-learning decision system includes an online decision system and an offline decision system. The online decision system produces a first time slice-specific decision output corresponding to a first time slice based on one or more situational inputs received in the first time slice. The offline decision system produces a second Lime slice-specific decision output corresponding to the first time slice based on one or more situational inputs received in the first time slice and in a plurality of subsequent time slices occurring after the first time slice. The system further includes an online training system that conducts negative-reinforcement training of the online decision system in response to a nonconvergence between the first and the second time slice-specific decision outputs.

4.

发明授权
Apparatus and method for propagating conditionally evaluated values in SIMD/vector execution using an input mask register 有权

公开(公告)号：US09798541B2

公开(公告)日：2017-10-24

申请号：US13997183

申请日：2011-12-23

申请人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Daehyun Kim , Albert Hartono , Sara S. Baghsorkhi

发明人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Daehyun Kim , Albert Hartono , Sara S. Baghsorkhi

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30098 , G06F9/30192 , G06F9/3877

摘要： An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.

5.

发明申请
APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION 有权
标题翻译：在SIMD / VECTOR执行中传播有条件评估值的装置和方法

公开(公告)号：US20140189323A1

公开(公告)日：2014-07-03

申请号：US13997183

申请日：2011-12-23

申请人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Daehyun Kim , Albert Hartono , Sara S. Baghsorkhi

发明人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Daehyun Kim , Albert Hartono , Sara S. Baghsorkhi

IPC分类号： G06F9/30

CPC分类号： G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30098 , G06F9/30192 , G06F9/3877

摘要： An apparatus and method for propagating conditionally evaluated values. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.

摘要翻译： 用于传播有条件评估值的装置和方法。例如，根据一个实施例的方法包括：读取输入屏蔽寄存器中包含的每个值，每个值是真值或假值，并具有与其相关联的位位置; 对于从输入掩码寄存器读取的每个真值，生成包含真值的位位置的第一结果; 对于从输入屏蔽寄存器读取的每个错误值跟随第一个真实值，将输入屏蔽寄存器的向量长度加到从输入屏蔽寄存器读取的最后一个真值的位位置，以产生第二个结果; 并将每个第一结果和第二结果存储在与从输入屏蔽寄存器读取的位位置对应的输出寄存器的位位置中。

6.

发明授权
Apparatus and method for vectorization with speculation support 有权
标题翻译：用于推测支持的矢量化装置和方法

公开(公告)号：US09268626B2

公开(公告)日：2016-02-23

申请号：US13997664

申请日：2011-12-23

申请人： Jayashankar Bharadwaj , Victor W. Lee , Kim Daehyun , Nalini Vasudevan , Tin-Fook Ngai , Albert Hartono , Sara S. Baghsorkhi

发明人： Jayashankar Bharadwaj , Victor W. Lee , Kim Daehyun , Nalini Vasudevan , Tin-Fook Ngai , Albert Hartono , Sara S. Baghsorkhi

IPC分类号： G06F11/00 , G06F11/07 , G06F9/30

CPC分类号： G06F11/0751 , G06F9/30018 , G06F9/30036 , G06F9/30043

摘要： An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.

摘要翻译： 描述了用于检测和响应处理器中的故障状况的装置和方法。例如，一种方法的一个实施例包括：从第一向量寄存器连续读取每个有源元件，每个有源元件指定用于集合或加载操作的地址; 检测与一个或多个所述有源元件相关联的一个或多个故障状况; 对于在除了所述第一有源元件之外的元件的检测到的故障状况之前连续读取的每个有源元件，将从与所述有源元件相关联的地址加载的数据存储在第一输出向量寄存器中; 并且对于与检测到的故障状况相关联的每个有源元件并且跟随检测到的故障状况，设置输出屏蔽寄存器中的位以指示检测到的故障状况。

7.

发明申请
INSTRUCTION TO REDUCE ELEMENTS IN A VECTOR REGISTER WITH STRIDED ACCESS PATTERN 有权
标题翻译：指示减少具有强力访问模式的矢量寄存器中的元件

公开(公告)号：US20140189288A1

公开(公告)日：2014-07-03

申请号：US13993653

申请日：2012-12-28

申请人： Albert Hartono , Jayashankar Bharadwaj , Nalini Vasudevan , Sara S. Baghsorkhi , Victor W. Lee , Daehyun Kim

发明人： Albert Hartono , Jayashankar Bharadwaj , Nalini Vasudevan , Sara S. Baghsorkhi , Victor W. Lee , Daehyun Kim

IPC分类号： G06F9/30

CPC分类号： G06F9/30036 , G06F9/3001 , G06F9/30018 , G06F9/30065 , G06F9/3455

摘要： A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially set one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.

摘要翻译： 由处理器的执行电路接收并执行具有非单位步进访问模式的向量减少指令。响应于该指令，执行电路对第一向量寄存器的数据元素执行关联缩减操作。基于屏蔽寄存器的值和正在处理的当前元件位置，执行电路顺序地将第一向量寄存器的一个或多个数据元素设置为结果，该结果是通过应用于先前的数据元素第一向量寄存器和第三向量寄存器的数据元素。先前的数据元素位于远离当前元素位置的多个元素位置。

8.

发明申请
APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COMPUTATION 审中-公开
标题翻译：选择矢量计算要素的装置和方法

公开(公告)号：US20130332701A1

公开(公告)日：2013-12-12

申请号：US13996521

申请日：2011-12-23

申请人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Daehyun Kim , Albert Hartono , Sara S. Baghsorkhi

发明人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Daehyun Kim , Albert Hartono , Sara S. Baghsorkhi

IPC分类号： G06F9/30

CPC分类号： G06F9/30098 , G06F9/30018 , G06F9/30036

摘要： An apparatus and method are described for selecting elements to be used in a vector computation. For example, a method according to one embodiment includes the following operations: specifying whether to identify the first, last or next after last active element of an input mask register using an immediate value; identifying the first, last or next after last active element in the input mask register according to the immediate value; reading a value from an input vector register corresponding to the identified first, last or next after last active element in the input mask register; and writing the value to an output vector register.

摘要翻译： 描述了用于选择要在向量计算中使用的元素的装置和方法。例如，根据一个实施例的方法包括以下操作：使用立即值来指定是否识别输入屏蔽寄存器的第一，最后或下一个有效元素; 根据立即值识别输入屏蔽寄存器中的最后一个或最后一个有效元素; 从输入矢量寄存器读取对应于输入屏蔽寄存器中识别的第一，最后或下一个最后有效元件的值; 并将该值写入输出向量寄存器。

9.

发明授权
Speculative non-faulting loads and gathers 有权
标题翻译：投机无故障负载和收集

公开(公告)号：US09189236B2

公开(公告)日：2015-11-17

申请号：US13725907

申请日：2012-12-21

申请人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Sara S. Baghsorkhi , Albert Hartono , Daehyun Kim

发明人： Jayashankar Bharadwaj , Nalini Vasudevan , Victor W. Lee , Sara S. Baghsorkhi , Albert Hartono , Daehyun Kim

IPC分类号： G06F11/00 , G06F9/30 , G06F11/07

CPC分类号： G06F9/30145 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F11/073 , G06F11/0793

摘要： According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand.

摘要翻译： 根据一个实施例，处理器包括指令解码器，用于解码从存储器读取多个数据元素的指令，该指令具有指定存储位置的第一操作数，指定具有一个或多个位的位掩码的第二操作数，每个位对应于数据元素之一，以及指定存储多个数据元素的存储器地址的第三操作数。所述处理器还包括执行单元，响应于所述指令，所述执行单元基于所述第二操作数指定的位掩码，从存储器位置推测性地读取一个或多个数据元素，所述执行单元基于由所述存储器地址并且将一个或多个数据元素存储在由第一操作数指示的存储位置中。

10.

发明申请
AUTOMATIC LOOP VECTORIZATION USING HARDWARE TRANSACTIONAL MEMORY 有权
标题翻译：使用硬件交易记忆的自动环路测向

公开(公告)号：US20150268940A1

公开(公告)日：2015-09-24

申请号：US14222040

申请日：2014-03-21

申请人： Sara S. Baghsorkhi , Albert Hartono , Youfeng Wu , Nalini Vasudevan , Cheng Wang

发明人： Sara S. Baghsorkhi , Albert Hartono , Youfeng Wu , Nalini Vasudevan , Cheng Wang

IPC分类号： G06F9/45

CPC分类号： G06F8/452

摘要： Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.

摘要翻译： 用于自动循环矢量化的技术包括具有优化编译器的计算设备。在优化传递期间，编译器识别循环并生成包括循环体的向量化实现的事务代码段，其包括能够产生异常的一个或多个向量存储器读取指令。编译器还生成非事务性回退代码段，其包括响应于在事务代码段内生成的异常被执行的循环体的标量实现。编译器可以检测循环是否包含依赖于可以在先前迭代中更新的条件的存储器读取，或者循环是否包含两次迭代之间的潜在数据依赖性。当实际数据依赖性存在时，编译器可以生成实际数据依赖性和要执行的显式事务中止指令的动态检查。描述和要求保护其他实施例。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类