-
公开(公告)号:US20210191620A1
公开(公告)日:2021-06-24
申请号:US16724609
申请日:2019-12-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: SeyedMohammad SEYEDZADEHDELCHEH , Xianwei ZHANG , Bradford BECKMANN , Shomit N. DAS
IPC: G06F3/06 , G06F12/0875
Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
-
公开(公告)号:US20200183485A1
公开(公告)日:2020-06-11
申请号:US16213126
申请日:2018-12-07
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. DAS , Joseph L. GREATHOUSE
IPC: G06F1/3287 , G06F1/324 , G06F1/3296
Abstract: A processing system dynamically scales at least one of voltage and frequency at a subset of a plurality of compute units of a graphics processing unit (GPU) based on characteristics of a kernel or workload to be executed at the subset. A system management unit for the processing system receives a compute unit mask, designating the subset of a plurality of compute units of a GPU to execute the kernel or workload, and workload characteristics indicating the compute-boundedness or memory bandwidth-boundedness of the kernel or workload from a central processing unit of the processing system. The system management unit determines a dynamic voltage and frequency scaling policy for the subset of the plurality of compute units of the GPU based on the compute unit mask and the workload characteristics.
-
公开(公告)号:US20200183597A1
公开(公告)日:2020-06-11
申请号:US16212388
申请日:2018-12-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. DAS , Kishore PUNNIYAMURTHY
IPC: G06F3/06
Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.
-
公开(公告)号:US20200151573A1
公开(公告)日:2020-05-14
申请号:US16425403
申请日:2019-05-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. DAS , Abhinav VISHNU
Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.
-
公开(公告)号:US20220083233A1
公开(公告)日:2022-03-17
申请号:US17497286
申请日:2021-10-08
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Seyed Mohammad SEYEDZADEHDELCHEH , Xianwei ZHANG , Bradford BECKMANN , Shomit N. DAS
IPC: G06F3/06 , G06F12/0875
Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
-
公开(公告)号:US20210191770A1
公开(公告)日:2021-06-24
申请号:US16718896
申请日:2019-12-18
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Karthik RAO , Shomit N. DAS , Manish ARORA
Abstract: A processing unit preemptively cools selected compute units prior to initiating execution of a wavefront at the selected compute units. A scheduler of the processing unit identifies that a wavefront is to be executed at a selected subset of compute units of the processing unit. In response, the processing unit's temperature control subsystem activates one or more cooling elements to reduce the temperature of the subset of compute units, prior to the scheduler initiating execution of the wavefront. By preemptively cooling the compute units, the temperature control subsystem increases the difference between the initial temperature of the compute units and a thermal throttling threshold that triggers performance-impacting temperature control measures, such as the reduction of a compute unit clock frequency.
-
-
-
-
-