Fast thread wake-up through early lock release

    公开(公告)号:US11055150B2

    公开(公告)日:2021-07-06

    申请号:US15952143

    申请日:2018-04-12

    Abstract: A thread holding a lock notifies a sleeping thread that is waiting on the lock that the lock holding thread is “about” to release the lock. In response to the notification, the waiting thread is woken up. While the waiting thread is woken up, the lock holding thread completes other operations prior to actually releasing the lock and then releases the lock. The notification to the waiting thread hides latency associated with waking up the waiting thread by allowing operations that wake up the waiting thread to occur while the lock holding thread is performing the other operations prior to releasing the thread.

    Dynamic adaptation of memory page management policy

    公开(公告)号:US10705972B2

    公开(公告)日:2020-07-07

    申请号:US15264400

    申请日:2016-09-13

    Abstract: Systems, apparatuses, and methods for determining preferred memory page management policies by software are disclosed. Software executing on one or more processing units generates a memory request. Software determines the preferred page management policy for the memory request based at least in part on the data access size and data access pattern of the memory request. Software conveys an indication of a preferred page management policy to a memory controller. Then, the memory controller accesses memory for the memory request using the preferred page management policy specified by software.

    Temperature-based adjustments for in-memory matrix multiplication

    公开(公告)号:US11507641B2

    公开(公告)日:2022-11-22

    申请号:US16428903

    申请日:2019-05-31

    Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

    Logical memory address regions
    4.
    发明授权

    公开(公告)号:US10255191B2

    公开(公告)日:2019-04-09

    申请号:US15133033

    申请日:2016-04-19

    Abstract: Systems, apparatuses, and methods for implementing logical memory address regions in a computing system. The physical memory address space of a computing system may be partitioned into a plurality of logical memory address regions. Each logical memory address region may be dynamically configured at run-time to meet changing application needs of the system. Each logical memory address region may also be configured separately from the other logical memory address regions. Each logical memory address region may have associated parameters that identify region start address, region size, cell-level mode, physical-to-device mapping scheme, address masks, access permissions, wear-leveling data, encryption settings, and compression settings. These parameters may be stored in a table which may be used when processing memory access requests.

    DISTRIBUTED GATHER/SCATTER OPERATIONS ACROSS A NETWORK OF MEMORY NODES
    5.
    发明申请
    DISTRIBUTED GATHER/SCATTER OPERATIONS ACROSS A NETWORK OF MEMORY NODES 审中-公开
    分布式GATHER / SCATTER操作通过存储器网络

    公开(公告)号:US20170048320A1

    公开(公告)日:2017-02-16

    申请号:US15221554

    申请日:2016-07-27

    CPC classification number: H04L67/1097

    Abstract: Devices, methods, and systems for distributed gather and scatter operations in a network of memory nodes. A responding memory node includes a memory; a communications interface having circuitry configured to communicate with at least one other memory node; and a controller. The controller includes circuitry configured to receive a request message from a requesting node via the communications interface. The request message indicates a gather or scatter operation, and instructs the responding node to retrieve data elements from a source memory data structure and store the data elements to a destination memory data structure. The controller further includes circuitry configured to transmit a response message to the requesting node via the communications interface. The response message indicates that the data elements have been stored into the destination memory data structure.

    Abstract translation: 在内存节点网络中进行分布式收集和分散操作的设备,方法和系统。 响应存储器节点包括存储器; 通信接口,其具有被配置为与至少一个其他存储器节点进行通信的电路; 和控制器。 控制器包括经配置以经由通信接口从请求节点接收请求消息的电路。 请求消息指示收集或散布操作,并指示响应节点从源存储器数据结构中检索数据元素,并将数据元素存储到目的地存储器数据结构。 控制器还包括经配置以经由通信接口向请求节点发送响应消息的电路。 响应消息指示数据元素已被存储到目的地存储器数据结构中。

    TEMPERATURE-BASED ADJUSTMENTS FOR IN-MEMORY MATRIX MULTIPLICATION

    公开(公告)号:US20200380063A1

    公开(公告)日:2020-12-03

    申请号:US16428903

    申请日:2019-05-31

    Abstract: Techniques for performing in-memory matrix multiplication, taking into account temperature variations in the memory, are disclosed. In one example, the matrix multiplication memory uses ohmic multiplication and current summing to perform the dot products involved in matrix multiplication. One downside to this analog form of multiplication is that temperature affects the accuracy of the results. Thus techniques are provided herein to compensate for the effects of temperature increases on the accuracy of in-memory matrix multiplications. According to the techniques, portions of input matrices are classified as effective or ineffective. Effective portions are mapped to low temperature regions of the in-memory matrix multiplier and ineffective portions are mapped to high temperature regions of the in-memory matrix multiplier. The matrix multiplication is then performed.

    RUNTIME EXTENSION FOR NEURAL NETWORK TRAINING WITH HETEROGENEOUS MEMORY

    公开(公告)号:US20200042859A1

    公开(公告)日:2020-02-06

    申请号:US16194958

    申请日:2018-11-19

    Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

    COMPILER-ASSISTED INTER-SIMD-GROUP REGISTER SHARING

    公开(公告)号:US20180275991A1

    公开(公告)日:2018-09-27

    申请号:US15935399

    申请日:2018-03-26

    Abstract: Systems, apparatuses, and methods for efficiently sharing registers among threads are disclosed. A system includes at least a processor, control logic, and a register file with a plurality of registers. The processor assigns a base set of registers to each thread of a plurality of threads executing on the processor. When a given thread needs more than the base set of registers to execute a given phase of program code, the given thread executes an acquire instruction to acquire exclusive access to an extended set of registers from a shared resource pool. When the given thread no longer needs additional registers, the given thread executes a release instruction to release the extended set of registers back into the shared register pool for other threads to use. In one implementation, the compiler inserts acquire and release instructions into the program code based on a register liveness analysis performed during compilation.

Patent Agency Ranking