-
1.
公开(公告)号:US11726837B2
公开(公告)日:2023-08-15
申请号:US17519290
申请日:2021-11-04
发明人: Karthik Rao , Shomit N. Das , Xudong An , Wei Huang
IPC分类号: G06F9/50 , G06F9/48 , G06F9/38 , H04L67/12 , G06F1/3206 , G06F13/40 , G06F3/06 , H04N19/436
CPC分类号: G06F9/5094 , G06F9/3867 , G06F9/3877 , G06F9/4893 , G06F9/5011 , G06F9/5027 , G06F9/5055 , H04L67/12 , G06F1/3206 , G06F3/0613 , G06F9/5061 , G06F13/409 , H04N19/436
摘要: In some examples, thermal aware optimization logic determines a characteristic (e.g., a workload or type) of a wavefront (e.g., multiple threads). For example, the characteristic indicates whether the wavefront is compute intensive, memory intensive, mixed, and/or another type of wavefront. The thermal aware optimization logic determines temperature information for one or more compute units (CUs) in one or more processing cores. The temperature information includes predictive thermal information indicating expected temperatures corresponding to the one or more CUs and historical thermal information indicating current or past thermal temperatures of at least a portion of a graphics processing unit (GPU). The logic selects the one or more compute units to process the plurality of threads based on the determined characteristic and the temperature information. The logic provides instructions to the selected subset of the plurality of CUs to execute the wavefront.
-
公开(公告)号:US11544196B2
公开(公告)日:2023-01-03
申请号:US16725971
申请日:2019-12-23
IPC分类号: G06F12/08 , G06F12/0871 , G06F12/0897 , G06F11/30 , G06F12/02
摘要: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.
-
公开(公告)号:US20210157485A1
公开(公告)日:2021-05-27
申请号:US17029158
申请日:2020-09-23
发明人: Matthew Tomei , Shomit N. Das , David A. Wood
IPC分类号: G06F3/06 , G06F12/0802
摘要: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.
-
公开(公告)号:US20200153757A1
公开(公告)日:2020-05-14
申请号:US16188900
申请日:2018-11-13
发明人: Srikant Bharadwaj , Shomit N. Das
IPC分类号: H04L12/933 , H04L12/775
摘要: A system is described that includes an integrated circuit chip having a network-on-chip. The network-on-chip includes multiple routers arranged in a topology and a separate communication link coupled between each router and each of one or more neighboring routers of that router among the multiple routers in the topology. The integrated circuit chip also includes multiple nodes, each node coupled to a router of the multiple routers. When operating, a given router of the multiple routers keeps a record of operating states of some or all of the multiple routers and corresponding communication links. The given router then routes flits to destination nodes via one or more other routers of the multiple routers based at least in part on the operating states of the some or all of the multiple routers and the corresponding communication links.
-
公开(公告)号:US10318363B2
公开(公告)日:2019-06-11
申请号:US15338172
申请日:2016-10-28
发明人: Greg Sadowski , Steven E. Raasch , Shomit N. Das , Wayne Burleson
摘要: A system and method for managing operating parameters within a system for optimal power and reliability are described. A device includes a functional unit and a corresponding reliability evaluator. The functional unit provides reliability information to one or more reliability monitors, which translate the information to reliability values. The reliability evaluator determines an overall reliability level for the system based on the reliability values. The reliability monitor compares the actual usage values and the expected usage values. When system has maintained a relatively high level of reliability for a given time interval, the reliability evaluator sends an indication to update operating parameters to reduce reliability of the system, which also reduces power consumption for the system.
-
公开(公告)号:US12001237B2
公开(公告)日:2024-06-04
申请号:US17029158
申请日:2020-09-23
发明人: Matthew Tomei , Shomit N. Das , David A. Wood
IPC分类号: G06F12/00 , G06F3/06 , G06F12/0802
CPC分类号: G06F3/0608 , G06F3/0655 , G06F3/0676 , G06F3/0679 , G06F12/0802
摘要: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.
-
公开(公告)号:US11842199B2
公开(公告)日:2023-12-12
申请号:US16913146
申请日:2020-06-26
发明人: Greg Sadowski , John Kalamatianos , Shomit N. Das
IPC分类号: G06F9/38
CPC分类号: G06F9/3871 , G06F9/3836 , G06F9/3869
摘要: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.
-
公开(公告)号:US20230110376A1
公开(公告)日:2023-04-13
申请号:US18058534
申请日:2022-11-23
IPC分类号: G06F12/0871 , G06F11/30 , G06F12/0897 , G06F12/02
摘要: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.
-
公开(公告)号:US20200210343A1
公开(公告)日:2020-07-02
申请号:US16232314
申请日:2018-12-26
IPC分类号: G06F12/0897 , G06F12/0815
摘要: An electronic device includes at least one compression-decompression functional block and a hierarchy of cache memories with a first cache memory and a second cache memory. The at least one compression-decompression functional block receives data in an uncompressed state, compresses the data using one of a first compression or a second compression, and, after compressing the data, provides the data to the first cache memory for storage therein. When the data is retrieved from the first cache memory to be stored in the second cache memory, when the data is compressed using the first compression, the compression-decompression functional block decompresses the data to reverse effects of the first compression on the data, thereby restoring the data to the uncompressed state and provides the data compressed using the second compression or in the uncompressed state to the second cache memory for storage therein.
-
公开(公告)号:US10411731B1
公开(公告)日:2019-09-10
申请号:US16140025
申请日:2018-09-24
发明人: Shomit N. Das , Matthew Tomei
IPC分类号: H03M7/34 , H03M7/40 , H03M7/30 , H03M13/00 , H03M5/00 , G06T9/00 , H03M7/00 , H04N19/134 , H04N19/103
摘要: A processing device is provided which includes a plurality of encoders each configured to compress a portion of data using a different compression algorithm. The processing device also includes one or more processors configured to cause an encoder, of the plurality of encoders, to compress the portion of data when it is determined that the portion of data, which is compressed by another encoder configured to compress the portion of data prior to the encoder in an encoder hierarchy, is not successfully compressed according to a compression metric by the other encoder in the encoder hierarchy. The one or more processors are also configured to prevent the encoder from compressing the portion of data when it is determined that the portion of data is successfully compressed according to the compression metric by the other encoder in the encoder hierarchy.
-
-
-
-
-
-
-
-
-