专利检索 ap:("Michael Espig" OR "Bret Toll" OR "Raanan Sade" OR "Robert Valentine" OR "Alexander Heinecke") AND inv:"Raanan Sade" 第 1 页

1.

发明申请
METHOD AND APPARATUS FOR EFFICIENT MATRIX ALIGNMENT IN A SYSTOLIC ARRAY 审中-公开

公开(公告)号：US20190042262A1

公开(公告)日：2019-02-07

申请号：US16147506

申请日：2018-09-28

申请人： Michael Espig , Bret Toll , Raanan Sade , Robert Valentine , Alexander Heinecke

发明人： Michael Espig , Bret Toll , Raanan Sade , Robert Valentine , Alexander Heinecke

IPC分类号： G06F9/38 , G06F15/80 , G06F9/30

摘要： An apparatus and method for efficient matrix alignment in a systolic array. For example, one embodiment of a processor comprises: a first set of physical tile registers to store first matrix data in rows or columns; a second set of physical tile registers to store second matrix data in rows or columns; a decoder to decode a matrix instruction identifying a first input matrix, a first offset, a second input matrix, and a second offset; and execution circuitry, responsive to the matrix instruction, to read a subset of rows or columns from the first set of physical tile registers in accordance with the first offset, spanning multiple physical tile registers from the first set if indicated by the first offset to generate a first input matrix and the execution circuitry to read a subset of rows or columns from the second set of physical tile registers in accordance with the second offset, spanning multiple physical tile registers from the second set if indicated by the second offset to generate a second input matrix; and the execution circuitry to perform an arithmetic operation with the first and second input matrices in accordance with an opcode of the matrix instruction.

2.

发明授权
Apparatus and method for memory-hierarchy aware producer-consumer instruction 有权

公开(公告)号：US09990287B2

公开(公告)日：2018-06-05

申请号：US13994122

申请日：2011-12-21

申请人： Shlomo Raikin , Raanan Sade , Robert Valentine , Julius Yuli Mandelblat , Ron Shalev , Larisa Novakovsky

发明人： Shlomo Raikin , Raanan Sade , Robert Valentine , Julius Yuli Mandelblat , Ron Shalev , Larisa Novakovsky

IPC分类号： G06F13/38 , G06T1/20 , G06F12/0811 , G06F9/30 , G06F9/38 , G06F13/16 , G06T1/60 , G09G5/00 , G06F12/0866

CPC分类号： G06F12/0811 , G06F9/30043 , G06F9/30047 , G06F9/30087 , G06F9/3881 , G06F12/0866 , G06F13/1673 , G06F13/38 , G06T1/20 , G06T1/60 , G09G5/006

摘要： An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.

3.

发明申请
APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTION 有权
标题翻译：用于记忆级别生产者消费者指令的装置和方法

公开(公告)号：US20140192069A1

公开(公告)日：2014-07-10

申请号：US13994122

申请日：2011-12-21

申请人： Shlomo Raikin , Raanan Sade , Robert Valentine , Julius Yuli Mandelblat , Ron Shalev , Larisa Novakovsky

发明人： Shlomo Raikin , Raanan Sade , Robert Valentine , Julius Yuli Mandelblat , Ron Shalev , Larisa Novakovsky

IPC分类号： G06F13/38 , G06F13/16 , G06T1/60 , G06F12/08 , G06T1/20

CPC分类号： G06F12/0811 , G06F9/30043 , G06F9/30047 , G06F9/30087 , G06F9/3881 , G06F12/0866 , G06F13/1673 , G06F13/38 , G06T1/20 , G06T1/60 , G09G5/006

摘要： An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.

摘要翻译： 描述了一种有效地将数据从中央处理单元（CPU）的核心传输到图形处理单元（GPU）的装置和方法。例如，一种方法的一个实施例包括：将数据写入CPU的核心内的缓冲器，直到指定的数据量被写入为止; 在检测到指定量的数据已被写入时，响应地产生驱逐周期，驱逐循环使数据从缓冲器传送到可由核心和GPU访问的高速缓存; 设置指示以向GPU指示数据在高速缓存中可用; 并且在GPU检测到指示时，在从GPU接收到读取信号时，从高速缓存提供数据给GPU。

4.

发明申请
APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTIONS 审中-公开
标题翻译：用于记忆级别生产者消费者指令的装置和方法

公开(公告)号：US20140208031A1

公开(公告)日：2014-07-24

申请号：US13994724

申请日：2011-12-21

申请人： Shlomo Raikin , Robert Valentine , Raanan Sade , Julius Yuli Mandelbalt , Ron Shalev , Larisa Novakovsky

发明人： Shlomo Raikin , Robert Valentine , Raanan Sade , Julius Yuli Mandelbalt , Ron Shalev , Larisa Novakovsky

IPC分类号： G06F12/08 , G06T1/60

CPC分类号： G06F12/0811 , G06F9/3828 , G06F9/3891 , G06F12/0891 , G06T1/60

摘要： An apparatus and method are described for efficiently transferring data from a producer core to a consumer core within a central processing unit (CPU). For example, one embodiment of a method comprises: A method for transferring a chunk of data from a producer core of a central processing unit (CPU) to consumer core of the CPU, comprising: writing data to a buffer within the producer core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the fill buffer to a cache accessible by both the producer core and the consumer core; and upon the consumer core detecting that data is available in the cache, providing the data to the consumer core from the cache upon receipt of a read signal from the consumer core.

摘要翻译： 描述了一种用于在中央处理单元（CPU）内有效地将数据从生产者核心传送到消费者核心的装置和方法。例如，一种方法的一个实施例包括：一种用于将数据块从中央处理单元（CPU）的生产者核心转移到CPU的消费者核心的方法，包括：将数据写入到所述CPU的生产者核心内的缓冲器 CPU直到指定数据量被写入; 在检测到指定量的数据被写入时，响应地产生驱逐周期，使得将数据从填充缓冲器传送到可由生产者核心和消费者核心访问的高速缓存的逐出循环; 并且在消费者核心检测到数据在高速缓存中可用时，在从消费者核心接收到读取信号时从高速缓存提供数据给消费者核心。

5.

发明申请
SUSPENDABLE LOAD ADDRESS TRACKING INSIDE TRANSACTIONS 审中-公开

公开(公告)号：US20180095759A1

公开(公告)日：2018-04-05

申请号：US15282011

申请日：2016-09-30

申请人： Raanan Sade , Roman Dementiev , Ravi Rajwar , Ady Tal , Alex Gerber

发明人： Raanan Sade , Roman Dementiev , Ravi Rajwar , Ady Tal , Alex Gerber

IPC分类号： G06F9/30 , G06F12/0855 , G06F12/123 , G06F12/0875

CPC分类号： G06F9/30043 , G06F9/3838 , G06F9/466 , G06F9/467 , G06F12/0857 , G06F12/0862 , G06F12/0875 , G06F12/1027 , G06F12/12 , G06F12/123 , G06F2212/1024 , G06F2212/452 , G06F2212/6022

摘要： Suspendable load address tracking inside transactions is disclosed. An example processing device of implementations of the disclosure includes a transactional memory (TM) read set tracking component circuitry to identify a suspend read tracking instruction within a transaction executed by the processing device, mark load instructions occurring in the transaction subsequent to the identified suspend read tracking instruction with a suspend attribute, wherein the addresses corresponding to the marked load instructions are excluded from a read set maintained for the transaction, identify a resume read tracking instruction within the transaction, and stop marking the load instructions occurring subsequent to the identified resume read tracking instruction with the suspend attribute.

6.

发明申请
METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING 有权
标题翻译：使用商店预购切割高级商店的方法和装置

公开(公告)号：US20140223105A1

公开(公告)日：2014-08-07

申请号：US13993508

申请日：2011-12-30

申请人： Stanislav Shwartsman , Melih Ozgul , Sebastien Hily , Shlomo Raikin , Raanan Sade , Ron Shalev

发明人： Stanislav Shwartsman , Melih Ozgul , Sebastien Hily , Shlomo Raikin , Raanan Sade , Ron Shalev

IPC分类号： G06F9/38 , G06F12/08

CPC分类号： G06F9/3814 , G06F9/383 , G06F9/3834 , G06F9/3861 , G06F12/0808 , G06F12/0862 , G06F2212/6028 , G06F2212/62

摘要： In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.

摘要翻译： 根据本文公开的实施例，提供了使用商店预取来切割高级商店延迟的方法，系统，机制，技术和装置。例如，在一个实施例中，这种装置可以包括集成电路或乱序处理器装置，其处理不一致的指令并对高速缓存执行按顺序的要求。这样的集成电路或不按顺序的处理器装置还包括用于接收存储指令的装置; 用于执行所述存储指令的地址生成和转换以计算由所述存储指令访问的存储器的物理地址的装置; 以及用于在存储指令退出之前基于所述存储指令和所计算的物理地址来执行用于高速缓存行的预取的装置。

7.

发明申请
REGULATING ATOMIC MEMORY OPERATIONS TO PREVENT DENIAL OF SERVICE ATTACK 有权
标题翻译：调整原子性内存操作以防止服务攻击

公开(公告)号：US20120072984A1

公开(公告)日：2012-03-22

申请号：US12887898

申请日：2010-09-22

申请人： MICHAEL S. BAIR , David W. Burns , Robert S. Chappell , Prakash Math , Leslie A. Ong , Pankaj Raghuvanshi , Shlomo Raikin , Raanan Sade , Michael D. Tucknott , Igor Yanover

发明人： MICHAEL S. BAIR , David W. Burns , Robert S. Chappell , Prakash Math , Leslie A. Ong , Pankaj Raghuvanshi , Shlomo Raikin , Raanan Sade , Michael D. Tucknott , Igor Yanover

IPC分类号： G06F21/00

CPC分类号： G06F9/526

摘要： In one embodiment, the present invention includes a method for identifying a termination sequence for an atomic memory operation executed by a first thread, associating a timer with the first thread, and preventing the first thread from execution of a memory cluster operation after completion of the atomic memory operation until a prevention window has passed. This method may be executed by regulation logic associated with a memory execution unit of a processor, in some embodiments. Other embodiments are described and claimed.

摘要翻译： 在一个实施例中，本发明包括一种用于识别由第一线程执行的原子存储器操作的终止序列的方法，其将定时器与第一线程相关联，并且在完成第一线程之后防止第一线程执行存储器簇操作原子记忆操作，直到预防窗口过去。在一些实施例中，该方法可以通过与处理器的存储器执行单元相关联的调节逻辑执行。描述和要求保护其他实施例。

8.

发明申请
EXTENDING CACHE COHERENCY PROTOCOLS TO SUPPORT LOCALLY BUFFERED DATA 有权
标题翻译：扩展缓存协议来支持本地缓存数据

公开(公告)号：US20100169581A1

公开(公告)日：2010-07-01

申请号：US12346543

申请日：2008-12-30

申请人： Gad Sheaffer , Shlomo Raikin , Vadim Bassin , Raanan Sade , Ehud Cohen , Oleg Margulis

发明人： Gad Sheaffer , Shlomo Raikin , Vadim Bassin , Raanan Sade , Ehud Cohen , Oleg Margulis

IPC分类号： G06F12/08 , G06F12/00

CPC分类号： G06F9/3834 , G06F9/467 , G06F12/0831 , G06F12/084

摘要： A method and apparatus for extending cache coherency to hold buffered data to support transactional execution is herein described. A transactional store operation referencing an address associated with a data item is performed in a buffered manner. Here, the coherency state associated with cache lines to hold the data item are transitioned to a buffered state. In response to local requests for the buffered data item, the data item is provided to ensure internal transactional sequential ordering. However, in response to external access requests, a miss response is provided to ensure the transactionally updated data item is not made globally visible until commit. Upon commit, the buffered lines are transitioned to a modified state to make the data item globally visible.

摘要翻译： 这里描述了用于扩展高速缓存一致性以保存缓冲数据以支持事务执行的方法和装置。以缓冲的方式执行引用与数据项相关联的地址的事务存储操作。这里，与保存数据项的高速缓存行相关联的一致性状态被转换到缓冲状态。响应缓冲数据项的本地请求，提供数据项以确保内部事务顺序排序。然而，响应于外部访问请求，提供了错误响应以确保事务更新的数据项在提交之前不会被全局可见。一旦提交，缓存的行将转换到修改状态，使数据项全局可见。

9.

发明授权
Method and apparatus for cutting senior store latency using store prefetching 有权
标题翻译：使用存储预取来切割高级存储延迟的方法和装置

公开(公告)号：US09405545B2

公开(公告)日：2016-08-02

申请号：US13993508

申请日：2011-12-30

申请人： Stanislav Shwartsman , Melih Ozgul , Sebastien Hily , Shlomo Raikin , Raanan Sade , Ron Shalev

发明人： Stanislav Shwartsman , Melih Ozgul , Sebastien Hily , Shlomo Raikin , Raanan Sade , Ron Shalev

IPC分类号： G06F12/08 , G06F9/38

CPC分类号： G06F9/3814 , G06F9/383 , G06F9/3834 , G06F9/3861 , G06F12/0808 , G06F12/0862 , G06F2212/6028 , G06F2212/62

摘要： In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.

摘要翻译： 根据本文公开的实施例，提供了使用商店预取来切割高级商店延迟的方法，系统，机制，技术和装置。例如，在一个实施例中，这种装置可以包括集成电路或乱序处理器装置，其处理不一致的指令并对高速缓存执行按顺序的要求。这样的集成电路或不按顺序的处理器装置还包括用于接收存储指令的装置; 用于执行所述存储指令的地址生成和转换以计算由所述存储指令访问的存储器的物理地址的装置; 以及用于在存储指令退出之前基于所述存储指令和所计算的物理地址来执行用于高速缓存行的预取的装置。

10.

发明授权
Method and system to reduce the power consumption of a memory device 有权
标题翻译：降低存储器件功耗的方法和系统

公开(公告)号：US08352683B2

公开(公告)日：2013-01-08

申请号：US12823047

申请日：2010-06-24

申请人： Ehud Cohen , Oleg Margulis , Raanan Sade , Stanislav Shwartsman

发明人： Ehud Cohen , Oleg Margulis , Raanan Sade , Stanislav Shwartsman

IPC分类号： G06F12/00 , G06F13/00 , G06F13/28 , G06F1/00

CPC分类号： G06F12/0864 , G06F12/0859 , G06F2212/1028 , G06F2212/6082 , Y02D10/13

摘要： A method and system to reduce the power consumption of a memory device. In one embodiment of the invention, the memory device is a N-way set-associative level one (L1) cache memory and there is logic coupled with the data cache memory to facilitate access to only part of the N-ways of the N-way set-associative L1 cache memory in response to a load instruction or a store instruction. By reducing the number of ways to access the N-way set-associative L1 cache memory for each load or store request, the power requirements of the N-way set-associative L1 cache memory is reduced in one embodiment of the invention. In one embodiment of the invention, when a prediction is made that the accesses to cache memory only requires the data arrays of the N-way set-associative L1 cache memory, the access to the fill buffers are deactivated or disabled.

摘要翻译： 一种降低存储器件功耗的方法和系统。在本发明的一个实施例中，存储器件是N路组合关联级（L1）高速缓冲存储器，并且存在与数据高速缓冲存储器耦合的逻辑，以便于仅访问N- 响应于加载指令或存储指令，单向设置关联L1高速缓冲存储器。通过减少针对每个加载或存储请求访问N路组合关联的L1高速缓冲存储器的方法的数量，在本发明的一个实施例中，减少了N路组合关联的L1高速缓冲存储器的功率需求。在本发明的一个实施例中，当预测到对高速缓存存储器的访问仅需要N路组关联的L1高速缓冲存储器的数据阵列时，对填充缓冲器的访问被去激活或禁用。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类