Hardware assisted memory profiling aggregator

    公开(公告)号:US11860755B2

    公开(公告)日:2024-01-02

    申请号:US17861435

    申请日:2022-07-11

    CPC classification number: G06F11/3037 G06F11/076 G06F11/3471 G06F11/3476

    Abstract: An approach is provided for implementing memory profiling aggregation. A hardware aggregator provides memory profiling aggregation by controlling the execution of a plurality of hardware profilers that monitor memory performance in a system. For each hardware profiler of the plurality of hardware profilers, a hardware counter value is compared to a threshold value. When a threshold value is satisfied, execution of a respective hardware profiler of the plurality of hardware profilers is initiated to monitor memory performance. Multiple hardware profilers of the plurality of hardware profilers may execute concurrently and each generate a result counter value. The result counter values generated by each hardware profiler of the plurality of hardware profilers are aggregated to generate an aggregate result counter value. The aggregate result counter value is stored in memory that is accessible by a software processes for use in optimizing memory-management policy decisions.

    Sense amplifier sleep state for leakage savings without bias mismatch

    公开(公告)号:US11854652B2

    公开(公告)日:2023-12-26

    申请号:US17984796

    申请日:2022-11-10

    CPC classification number: G11C7/065

    Abstract: A sense amplifier is biased to reduce leakage current equalize matched transistor bias during an idle state. A first read select transistor couples a true bit line and a sense amplifier true (SAT) signal line and a second read select transistor couples a complement bit line and a sense amplifier complement (SAC) signal line. The SAT and SAC signal lines are precharged during a precharge state. An equalization circuit shorts the SAT and SAC signal lines during the precharge state. A differential sense amplifier circuit for latching the memory cell value is coupled to the SAT signal line and the SAC signal line. The precharge circuit and the differential sense amplifier circuit are turned off during a sleep state to cause the SAT and SAC signal lines to float. A sleep circuit shorts the SAT and SAC signal lines during the sleep state.

    Neural Network Activation Scaled Clipping Layer

    公开(公告)号:US20230409868A1

    公开(公告)日:2023-12-21

    申请号:US17844204

    申请日:2022-06-20

    CPC classification number: G06N3/04 G06N3/08

    Abstract: Activation scaled clipping layers for neural networks are described. An activation scaled clipping layer processes an output of a neuron in a neural network using a scaling parameter and a clipping parameter. The scaling parameter defines how numerical values are amplified relative to zero. The clipping parameter specifies a numerical threshold that causes the neuron output to be expressed as a value defined by the numerical threshold if the neuron output satisfies the numerical threshold. In some implementations, the scaling parameter is linear and treats numbers within a numerical range as being equivalent, such that any number in the range is scaled by a defined magnitude, regardless of value. Alternatively, the scaling parameter is nonlinear, which causes the activation scaled clipping layer to amplify numbers within a range by different magnitudes. Each scaling and clipping parameter is learnable during training of a machine learning model implementing the neural network.

    Approach for reducing side effects of computation offload to memory

    公开(公告)号:US11847055B2

    公开(公告)日:2023-12-19

    申请号:US17364854

    申请日:2021-06-30

    CPC classification number: G06F12/084 G06F12/0238 G06F12/0811 G06F12/0862

    Abstract: A technical solution to the technical problem of how to reduce the undesirable side effects of offloading computations to memory uses read hints to preload results of memory-side processing into a processor-side cache. A cache controller, in response to identifying a read hint in a memory-side processing instruction, causes results of the memory-side processing to be preloaded into a processor-side cache. Implementations include, without limitation, enabling or disabling the preloading based upon cache thrashing levels, preloading results, or portions of results, of memory-side processing to particular destination caches, preloading results based upon priority and/or degree of confidence, and/or during periods of low data bus and/or command bus utilization, last stores considerations, and enforcing an ordering constraint to ensure that preloading occurs after memory-side processing results are complete.

    Fast block-based parallel message passing interface transpose

    公开(公告)号:US11836549B2

    公开(公告)日:2023-12-05

    申请号:US17071876

    申请日:2020-10-15

    CPC classification number: G06F9/546 G06F9/52

    Abstract: Computer-implemented techniques for fast block-based parallel message passing interface (MPI) transpose are disclosed. The techniques achieve an in-place parallel matrix transpose of an input matrix in a distributed-memory multiprocessor environment with reduced consumption of computer processing time and storage media resources. An in-memory copy of the input matrix or a submatrix thereof to use as the send buffer for MPI send operations is not needed. Instead, by dividing the input matrix in-place into data blocks having up to at most a predetermined size and sending the corresponding data block(s) for a given submatrix using an MPI API before receiving any data block(s) for the given submatrix using an MPI API in the place of the sent data block(s), making the in-memory copy to use a send buffer can be avoided and yet the input matrix can be transposed in-place.

    Guided cache replacement
    208.
    发明授权

    公开(公告)号:US11836088B2

    公开(公告)日:2023-12-05

    申请号:US17557793

    申请日:2021-12-21

    CPC classification number: G06F12/0893 G06F2212/6042

    Abstract: Guided cache replacement is described. In accordance with the described techniques, a request to access a cache is received, and a cache replacement policy which controls loading data into the cache is accessed. The cache replacement policy includes a tree structure having nodes corresponding to cachelines of the cache and a traversal algorithm controlling traversal of the tree structure to select one of the cachelines. Traversal of the tree structure is guided using the traversal algorithm to select a cacheline to allocate to the request. The guided traversal modifies at least one decision of the traversal algorithm to avoid selection of a non-replaceable cacheline.

Patent Agency Ranking