Patent search ap:("Amit Kumar" OR "Sreenivas Subramoney") AND inv:"Sreenivas Subramoney" Page 2

11.

发明授权
Coordinated thread criticality-aware memory scheduling 有权

公开(公告)号：US09921839B1

公开(公告)日：2018-03-20

申请号：US15275066

申请日：2016-09-23

Applicant: Lavanya Subramanian , Sreenivas Subramoney , Nithiyanandan Bashyam , Anant Nori

Inventor： Lavanya Subramanian , Sreenivas Subramoney , Nithiyanandan Bashyam , Anant Nori

IPC: G06F12/12 , G06F9/30 , G06F12/0862 , G06F9/50 , G06F9/48

CPC classification number: G06F9/3009 , G06F9/30047 , G06F9/4881 , G06F9/5016 , G06F12/0862 , G06F2212/1024 , G06F2212/602 , G06F2212/6026

Abstract: A multi-core processor includes a plurality of cores to execute a plurality of threads and to monitor metrics for each of the plurality of threads during an interval, the metrics including stall cycle values, prefetches of a first type, and prefetches of a second type. The multi-core processor further includes criticality-aware thread prioritization (CATP) logic to compute a stall fraction for each of the plurality of threads during the interval using the stall cycle values, identify a thread with a highest stall fraction of the plurality of threads, determine the highest stall fraction is greater than a stall threshold, prioritize demand requests of the identified thread, compute a prefetch accuracy of the identified thread during the interval using the prefetches of the first type and the prefetches of the second type, determine the prefetch accuracy is greater than a prefetch threshold, and prioritize prefetch requests of the identified thread.

12.

发明授权
Online learning based algorithms to increase retention and reuse of GPU-generated dynamic surfaces in outer-level caches 有权

公开(公告)号：US09720829B2

公开(公告)日：2017-08-01

申请号：US13993811

申请日：2011-12-29

Applicant: Suresh Srinivasan , Rakesh Ramesh , Sreenivas Subramoney , Jayesh Gaur

Inventor： Suresh Srinivasan , Rakesh Ramesh , Sreenivas Subramoney , Jayesh Gaur

IPC: G06F12/08 , G06F12/0802 , G06F12/0888 , G06T1/60

CPC classification number: G06F12/0802 , G06F12/0888 , G06T1/60 , G06T2200/28

Abstract: Some implementations disclosed herein provide techniques for caching memory data and for managing cache retention. Different cache retention policies may be applied to different cached data streams such as those of a graphics processing unit. Actual performance of the cache with respect to the data streams may be observed, and the cache retention policies may be varied based on the observed actual performance.

13.

发明授权
Dynamic performance monitoring-based approach to memory management 失效
Title translation: 基于动态性能监控的内存管理方法

公开(公告)号：US07490117B2

公开(公告)日：2009-02-10

申请号：US10749425

申请日：2003-12-31

Applicant: Sreenivas Subramoney , Richard Hudson , Mauricio Serrano , Ali-Reza Adl-Tabatabai

Inventor： Sreenivas Subramoney , Richard Hudson , Mauricio Serrano , Ali-Reza Adl-Tabatabai

IPC: G06F12/08

CPC classification number: G06F11/348 , G06F12/0253 , G06F12/0269 , G06F2201/88 , G06F2201/885 , Y10S707/99957

Abstract: Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.

Abstract translation: 描述了用于优化处理器系统中的存储器管理的技术。这些技术可以在包括片上性能监视的处理器以及外部性能监视器耦合到处理器的系统上实现。包括性能监控单元（PMU）的处理器就是例子。 PMU可以将数据存储在读取和写入高速缓存未命中，以及翻译后备缓冲区（TLB）未命中的数据。来自PMU的数据用于确定存储器堆中的任何存储器区域是否是过期存储器区域，即表现出大量存储器问题或失速的区域。如果发现存在不正当的内存区域，诸如垃圾收集例程的存储器管理器可以通过改进堆中对象的布局来有效地优化存储器性能以及突变器的性能。以这种方式，可以基于动态和实时存储器性能数据来集中存储器管理例程。

14.

发明申请
Dynamic performance monitoring-based approach to memory management 失效

公开(公告)号：US20060143421A1

公开(公告)日：2006-06-29

申请号：US10749425

申请日：2003-12-31

Applicant: Sreenivas Subramoney , Richard Hudson , Mauricio Serrano , Ali-Reza Adl-Tabatabai

Inventor： Sreenivas Subramoney , Richard Hudson , Mauricio Serrano , Ali-Reza Adl-Tabatabai

IPC: G06F12/00

CPC classification number: G06F11/348 , G06F12/0253 , G06F12/0269 , G06F2201/88 , G06F2201/885 , Y10S707/99957

Abstract: Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.

15.

发明申请
Methods and apparatus to dynamically insert prefetch instructions based on compiler and garbage collector analysis 失效
Title translation: 基于编译器和垃圾回收器分析动态插入预取指令的方法和装置

公开(公告)号：US20050138294A1

公开(公告)日：2005-06-23

申请号：US10742009

申请日：2003-12-19

Applicant: Mauricio Serrano , Sreenivas Subramoney , Richard Hudson , Ali-Reza Adl-Tabatabai

Inventor： Mauricio Serrano , Sreenivas Subramoney , Richard Hudson , Ali-Reza Adl-Tabatabai

IPC: G06F9/45 , G06F12/02 , G06F12/00

CPC classification number: G06F12/0253

Abstract: Methods and apparatus to insert prefetch instructions based on garbage collector analysis and compiler analysis are disclosed. In an example method, one or more batches of samples associated with cache misses from a performance monitoring unit in a processor system are received. One or more samples from the one or more batches of samples based on delinquent information are selected. A performance impact indicator associated with the one or more samples is generated. Based on the performance indicator, at least one of a garbage collector analysis and a compiler analysis is initiated to identify one or more delinquent paths. Based on the at least one of the garbage collector analysis and the compiler analysis, one or more prefetch points to insert prefetch instructions are identified.

Abstract translation: 公开了基于垃圾收集器分析和编译器分析来插入预取指令的方法和装置。在示例性方法中，接收与处理器系统中的来自性能监视单元的高速缓存未命中关联的一个或多个批次的样本。选择一个或多个基于犯罪信息的样本的一个或多个样本。产生与一个或多个样本相关联的性能影响指示符。基于性能指标，启动垃圾回收器分析和编译器分析中的至少一个以识别一个或多个违规路径。基于垃圾收集器分析和编译器分析中的至少一个，识别插入预取指令的一个或多个预取点。

16.

发明申请
APPARATUS, METHOD, AND COMPUTER-READABLE MEDIUM FOR ACTIVATION FUNCTION PREDICTION IN DEEP NEURAL NETWORKS 有权

公开(公告)号：US20220012571A1

公开(公告)日：2022-01-13

申请号：US17484423

申请日：2021-09-24

Applicant: Kamlesh Pillai , Gurpreet Singh Kalsi , Bharathwaj Suresh , Sreenivas Subramoney , Avishaii Abuhatzera

Inventor： Kamlesh Pillai , Gurpreet Singh Kalsi , Bharathwaj Suresh , Sreenivas Subramoney , Avishaii Abuhatzera

IPC: G06N3/04 , G06N3/10

Abstract: Apparatuses and articles of manufacture are disclosed. An example apparatus includes an activation function control and decode circuitry to populate an input buffer circuitry with an input data element bit subset of less than a threshold number of bits of the input data element retrieved from the memory circuitry. The activation function and control circuitry also populate a kernel weight buffer circuitry with a weight data element bit subset of less than the threshold number of bits of the weight data element retrieved from the memory circuitry. The apparatus also including a preprocessor circuitry to calculate a partial convolution value of at least a portion of the input data element bit subset and the weight data element bit subset to determine a predicted sign of the partial convolution value.

17.

发明授权
Bypass and insertion algorithms for exclusive last-level caches 有权
Title translation: 独占的最后一级缓存的旁路和插入算法

公开(公告)号：US08667222B2

公开(公告)日：2014-03-04

申请号：US13078415

申请日：2011-04-01

Applicant: Jayesh Gaur , Mainak Chaudhuri , Sreenivas Subramoney

Inventor： Jayesh Gaur , Mainak Chaudhuri , Sreenivas Subramoney

IPC: G06F12/00

CPC classification number: G06F12/0888 , G06F12/0897 , G06F12/122 , G06F12/123 , G06F12/128

Abstract: An apparatus and method are described for implementing an exclusive lower level cache (LLC) policy within a computer processor. For example, one embodiment of a computer processor comprises: a mid-level cache circuit (MLC) for storing a first set of cache lines containing instructions and/or data; a lower level cache circuit (LLC) for storing a second set of cache lines of instructions and/or data; and an insertion circuit for implementing a policy for inserting or replacing cache lines within the LLC based on values of use recency and use frequency associated with the lines.

Abstract translation: 描述了用于在计算机处理器内实现专用低级缓存（LLC）策略的装置和方法。例如，计算机处理器的一个实施例包括：用于存储包含指令和/或数据的第一组高速缓存行的中级高速缓存电路（MLC）; 用于存储第二组指令和/或数据的高速缓存行的低级缓存电路（LLC）; 以及插入电路，用于基于与线路相关联的使用新近度和使用频率的值来实现用于在LLC内插入或替换高速缓存线的策略。

18.

发明申请
BYPASS AND INSERTION ALGORITHMS FOR EXCLUSIVE LAST-LEVEL CACHES 有权
Title translation: 旁路和插入算法用于独特的最后级别的高速缓存

公开(公告)号：US20120254550A1

公开(公告)日：2012-10-04

申请号：US13078415

申请日：2011-04-01

Applicant: Jayesh Gaur , Mainak Chaudhuri , Sreenivas Subramoney

Inventor： Jayesh Gaur , Mainak Chaudhuri , Sreenivas Subramoney

IPC: G06F12/08

CPC classification number: G06F12/0888 , G06F12/0897 , G06F12/122 , G06F12/123 , G06F12/128

Abstract: An apparatus and method are described for implementing an exclusive lower level cache (LLC) policy within a computer processor. For example, one embodiment of a computer processor comprises: a mid-level cache circuit (MLC) for storing a first set of cache lines containing instructions and/or data; a lower level cache circuit (LLC) for storing a second set of cache lines of instructions and/or data; and an insertion circuit for implementing a policy for inserting or replacing cache lines within the LLC based on values of use recency and use frequency associated with the lines.

Abstract translation: 描述了用于在计算机处理器内实现专用低级缓存（LLC）策略的装置和方法。例如，计算机处理器的一个实施例包括：用于存储包含指令和/或数据的第一组高速缓存行的中级高速缓存电路（MLC）; 用于存储第二组指令和/或数据的高速缓存行的低级缓存电路（LLC）; 以及插入电路，用于基于与线路相关联的使用新近度和使用频率的值来实现用于在LLC内插入或替换高速缓存线的策略。

19.

发明授权
Methods and apparatus to dynamically insert prefetch instructions based on garbage collector analysis and layout of objects 失效
Title translation: 基于垃圾回收器分析和对象布局动态插入预取指令的方法和装置

公开(公告)号：US07577947B2

公开(公告)日：2009-08-18

申请号：US10741897

申请日：2003-12-19

Applicant: Sreenivas Subramoney , Mauricio J. Serrano , Richard L. Hudson , Ali-Reza Adl-Tabatabai

Inventor： Sreenivas Subramoney , Mauricio J. Serrano , Richard L. Hudson , Ali-Reza Adl-Tabatabai

IPC: G06F9/45

CPC classification number: G06F12/0253 , G06F12/0862 , G06F2212/6026

Abstract: Methods and apparatus to dynamically insert prefetch instructions are disclosed. In an example method, one or more samples associated with cache misses are identified from a performance monitoring unit in a processor system. Based on sample information associated with the one or more samples, delinquent information is generated. To dynamically insert one or more prefetch instructions, a prefetch point is identified based on the delinquent information.

Abstract translation: 公开了动态插入预取指令的方法和装置。在示例性方法中，与处理器系统中的性能监视单元识别与高速缓存未命中相关联的一个或多个样本。根据与一个或多个样本相关联的样本信息，生成违法信息。为了动态地插入一个或多个预取指令，基于拖欠信息来识别预取点。

20.

发明授权
Method for using non-temporal streaming to improve garbage collection algorithm 失效
Title translation: 使用非时间流提高垃圾收集算法的方法

公开(公告)号：US06950837B2

公开(公告)日：2005-09-27

申请号：US09885745

申请日：2001-06-19

Applicant: Sreenivas Subramoney , Richard L. Hudson

Inventor： Sreenivas Subramoney , Richard L. Hudson

IPC: G06F12/02 , G06F12/08 , G06F17/00

CPC classification number: G06F12/0888 , G06F12/0253 , Y10S707/99957

Abstract: An improved moving garbage collection algorithm is described. The algorithm allows efficient use of non-temporal stores to reduce the required time for garbage collection. Non-temporal stores (or copies) are a CPU feature that allows the copy of data objects within main memory with no interference or pollution of the cache memory. The live objects copied to new memory locations will not be accessed again in the near future and therefore need not be copied to cache. This avoids copy operations and avoids taxing the CPU with cache determinations. In a preferred embodiment, the algorithm of the present invention exploits the fact that live data objects will be stored to consecutive new memory locations in order to perform streaming copies. Since each copy procedure has an associated CPU overhead, the process of streaming the copies reduces the degradation of system performance and thus reduces the time for garbage collection.

Abstract translation: 描述了改进的移动垃圾收集算法。该算法允许有效地使用非时间存储来减少垃圾收集所需的时间。非时间存储（或副本）是一种CPU功能，允许在主存储器内复制数据对象，而不会对高速缓冲存储器造成干扰或污染。复制到新内存位置的实时对象在不久的将来不再被访问，因此不需要复制到缓存中。这避免了复制操作，并避免了使用缓存确定对CPU进行征税。在优选实施例中，本发明的算法利用实时数据对象将被存储到连续的新存储器位置以便执行流拷贝的事实。由于每个复制过程都具有相关的CPU开销，所以流式传输副本的过程减少了系统性能的降级，从而减少了垃圾回收的时间。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification