Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Amin Farmahini-Farahani"

1.

发明授权
Fast thread wake-up through early lock release 有权

公开(公告)号：US11055150B2

公开(公告)日：2021-07-06

申请号：US15952143

申请日：2018-04-12

Applicant: Advanced Micro Devices, Inc.

Inventor： Nuwan Jayasena , Amin Farmahini-Farahani , David A. Roberts

IPC: G06F9/52 , G06F9/48 , G06F9/50

Abstract: A thread holding a lock notifies a sleeping thread that is waiting on the lock that the lock holding thread is “about” to release the lock. In response to the notification, the waiting thread is woken up. While the waiting thread is woken up, the lock holding thread completes other operations prior to actually releasing the lock and then releases the lock. The notification to the waiting thread hides latency associated with waking up the waiting thread by allowing operations that wake up the waiting thread to occur while the lock holding thread is performing the other operations prior to releasing the thread.

2.

发明授权
Dynamic adaptation of memory page management policy 有权

公开(公告)号：US10705972B2

公开(公告)日：2020-07-07

申请号：US15264400

申请日：2016-09-13

Applicant: Advanced Micro Devices, Inc.

Inventor： Amin Farmahini-Farahani , Alexander D. Breslow , Nuwan S. Jayasena

IPC: G06F12/00 , G06F12/1009 , G06F13/16 , G06F13/00

Abstract: Systems, apparatuses, and methods for determining preferred memory page management policies by software are disclosed. Software executing on one or more processing units generates a memory request. Software determines the preferred page management policy for the memory request based at least in part on the data access size and data access pattern of the memory request. Software conveys an indication of a preferred page management policy to a memory controller. Then, the memory controller accesses memory for the memory request using the preferred page management policy specified by software.

3.

发明授权
Temperature-based adjustments for in-memory matrix multiplication 有权

公开(公告)号：US11507641B2

公开(公告)日：2022-11-22

申请号：US16428903

申请日：2019-05-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Majed Valad Beigi , Amin Farmahini-Farahani , Sudhanva Gurumurthi

IPC: G06F17/16 , G01K3/14 , G01K3/10

Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

4.

发明授权
Logical memory address regions 有权

公开(公告)号：US10255191B2

公开(公告)日：2019-04-09

申请号：US15133033

申请日：2016-04-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Amin Farmahini-Farahani , David A. Roberts

IPC: G06F12/00 , G06F12/1009 , G06F12/14 , G06F3/06 , G06F12/02

Abstract: Systems, apparatuses, and methods for implementing logical memory address regions in a computing system. The physical memory address space of a computing system may be partitioned into a plurality of logical memory address regions. Each logical memory address region may be dynamically configured at run-time to meet changing application needs of the system. Each logical memory address region may also be configured separately from the other logical memory address regions. Each logical memory address region may have associated parameters that identify region start address, region size, cell-level mode, physical-to-device mapping scheme, address masks, access permissions, wear-leveling data, encryption settings, and compression settings. These parameters may be stored in a table which may be used when processing memory access requests.

5.

发明申请
DISTRIBUTED GATHER/SCATTER OPERATIONS ACROSS A NETWORK OF MEMORY NODES 审中-公开
Title translation: 分布式GATHER / SCATTER操作通过存储器网络

公开(公告)号：US20170048320A1

公开(公告)日：2017-02-16

申请号：US15221554

申请日：2016-07-27

Applicant: Advanced Micro Devices, Inc.

Inventor： Amin Farmahini-Farahani , David A. Roberts

IPC: H04L29/08 , G06F3/06

CPC classification number: H04L67/1097

Abstract: Devices, methods, and systems for distributed gather and scatter operations in a network of memory nodes. A responding memory node includes a memory; a communications interface having circuitry configured to communicate with at least one other memory node; and a controller. The controller includes circuitry configured to receive a request message from a requesting node via the communications interface. The request message indicates a gather or scatter operation, and instructs the responding node to retrieve data elements from a source memory data structure and store the data elements to a destination memory data structure. The controller further includes circuitry configured to transmit a response message to the requesting node via the communications interface. The response message indicates that the data elements have been stored into the destination memory data structure.

Abstract translation: 在内存节点网络中进行分布式收集和分散操作的设备，方法和系统。响应存储器节点包括存储器; 通信接口，其具有被配置为与至少一个其他存储器节点进行通信的电路; 和控制器。控制器包括经配置以经由通信接口从请求节点接收请求消息的电路。请求消息指示收集或散布操作，并指示响应节点从源存储器数据结构中检索数据元素，并将数据元素存储到目的地存储器数据结构。控制器还包括经配置以经由通信接口向请求节点发送响应消息的电路。响应消息指示数据元素已被存储到目的地存储器数据结构中。

6.

发明申请
TEMPERATURE-BASED ADJUSTMENTS FOR IN-MEMORY MATRIX MULTIPLICATION 审中-公开

公开(公告)号：US20200380063A1

公开(公告)日：2020-12-03

申请号：US16428903

申请日：2019-05-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Majed Valad Beigi , Amin Farmahini-Farahani , Sudhanva Gurumurthi

IPC: G06F17/16 , G01K3/10 , G01K3/14

Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

7.

发明申请
RUNTIME EXTENSION FOR NEURAL NETWORK TRAINING WITH HETEROGENEOUS MEMORY 审中-公开

公开(公告)号：US20200042859A1

公开(公告)日：2020-02-06

申请号：US16194958

申请日：2018-11-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Georgios Mappouras , Amin Farmahini-Farahani , Sudhanva Gurumurthi , Abhinav Vishnu , Gabriel H. Loh

IPC: G06N3/04 , G06N3/08 , G06F9/445 , G06F9/54

Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

8.

发明申请
COMPILER-ASSISTED INTER-SIMD-GROUP REGISTER SHARING 审中-公开

公开(公告)号：US20180275991A1

公开(公告)日：2018-09-27

申请号：US15935399

申请日：2018-03-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Farzad Khorasani , Amin Farmahini-Farahani , Nuwan S. Jayasena

IPC: G06F9/30 , G06F15/80

CPC classification number: G06F9/30138 , G06F9/30076 , G06F9/3009 , G06F9/30123 , G06F9/3851 , G06F9/5011 , G06F15/8007

Abstract: Systems, apparatuses, and methods for efficiently sharing registers among threads are disclosed. A system includes at least a processor, control logic, and a register file with a plurality of registers. The processor assigns a base set of registers to each thread of a plurality of threads executing on the processor. When a given thread needs more than the base set of registers to execute a given phase of program code, the given thread executes an acquire instruction to acquire exclusive access to an extended set of registers from a shared resource pool. When the given thread no longer needs additional registers, the given thread executes a release instruction to release the extended set of registers back into the shared register pool for other threads to use. In one implementation, the compiler inserts acquire and release instructions into the program code based on a register liveness analysis performed during compilation.

9.

发明申请
INSTRUCTION SET AND MICRO-ARCHITECTURE SUPPORTING ASYNCHRONOUS MEMORY ACCESS 审中-公开

公开(公告)号：US20170212760A1

公开(公告)日：2017-07-27

申请号：US15353161

申请日：2016-11-16

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Meenakshi Sundaram Bhaskaran , Elliot H. Mednick , David A. Roberts , Anthony Asaro , Amin Farmahini-Farahani

IPC: G06F9/30 , G06F12/0817 , G06F12/0875 , G06F9/38

CPC classification number: G06F9/30043 , G06F9/3005 , G06F9/3016 , G06F9/3802 , G06F12/084 , G06F12/0862 , G06F12/0875 , G06F12/1027 , G06F2212/1024 , G06F2212/452 , G06F2212/6028

Abstract: A system and method for reducing latencies of main memory data accesses are described. A non-blocking load (NBLD) instruction identifies an address of requested data and a subroutine. The subroutine includes instructions dependent on the requested data. A processing unit verifies that address translations are available for both the address and the subroutine. The processing unit continues processing instructions with no stalls caused by younger-in-program-order instructions waiting for the requested data. The non-blocking load unit performs a cache coherent data read request on behalf of the NBLD instruction and requests that the processing unit perform an asynchronous jump to the subroutine upon return of the requested data from lower-level memory.

10.

发明授权
Runtime extension for neural network training with heterogeneous memory 有权

公开(公告)号：US11775799B2

公开(公告)日：2023-10-03

申请号：US16194958

申请日：2018-11-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Georgios Mappouras , Amin Farmahini-Farahani , Sudhanva Gurumurthi , Abhinav Vishnu , Gabriel H. Loh

IPC: G06N20/10 , G06N3/04 , G06F9/54 , G06F9/445 , G06N3/084

CPC classification number: G06N3/04 , G06F9/44505 , G06F9/544 , G06N3/084

Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification