-
公开(公告)号:US10409610B2
公开(公告)日:2019-09-10
申请号:US15010093
申请日:2016-01-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Bradford Beckmann , Sooraj Puthoor
Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.
-
公开(公告)号:US20150106587A1
公开(公告)日:2015-04-16
申请号:US14055221
申请日:2013-10-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Shuai Che , Bradford Beckmann , Blake Hechtman
IPC: G06F12/10
CPC classification number: G06F12/1054 , G06F12/0207 , G06F12/0284 , G06F12/10 , G06F12/109 , G06F2212/251
Abstract: A processor remaps stored data and the corresponding memory addresses of the data for different processing units of a heterogeneous processor. The processor includes a data remap engine that changes the format of the data (that is, how the data is physically arranged in segments of memory) in response to a transfer of the data from system memory to a local memory hierarchy of an accelerated processing module (APM) of the processor. The APM's local memory hierarchy includes an address remap engine that remaps the memory addresses of the data at the local memory hierarchy so that the data can be accessed by routines at the APM that are unaware of the data remapping. By remapping the data, and the corresponding memory addresses, the APM can perform operations on the data more efficiently.
Abstract translation: 处理器重新映射异构处理器的不同处理单元的存储数据和相应的数据存储器地址。 处理器包括响应于数据从系统存储器传输到加速处理模块的本地存储器层级而改变数据格式(即,数据在存储器段中物理布置的方式)的数据重映射引擎 (APM)。 APM的本地存储器层次结构包括地址重映射引擎,其重映射本地存储器层级上的数据的存储器地址,使得可以通过APM的不知道数据重映射的例程来访问数据。 通过重新映射数据和相应的存储器地址,APM可以更有效地对数据执行操作。
-
公开(公告)号:US20240329984A1
公开(公告)日:2024-10-03
申请号:US18128963
申请日:2023-03-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Yasuko Eckert , Vadim Vadimovich Nikiforov , Gabriel H. Loh , Bradford Beckmann
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30109
Abstract: An electronic device includes processing circuitry that executes a lookup table (LUT) vector instruction. Executing the lookup table vector instruction causes the processing circuitry to acquire a set of reference values by using each input value from an input vector as an index to acquire a reference value from a reference vector. The processing circuitry then provides the set of reference values for one or more subsequent operations. The processing circuitry can also use the set of reference values for controlling vector elements from among a set of vector elements for which a vector operation is performed.
-
公开(公告)号:US11875425B2
公开(公告)日:2024-01-16
申请号:US17134904
申请日:2020-12-28
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sooraj Puthoor , Bradford Beckmann , Nuwan Jayasena , Anthony Gutierrez
CPC classification number: G06T1/20 , G06F9/30036 , G06F9/3836 , G06F9/3877 , G06F9/3887 , G06F9/545 , G06T2210/52
Abstract: Implementing heterogeneous wavefronts on a graphics processing unit (GPU) is disclosed. A scheduler assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts and service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogeneous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogeneous wavefronts on the compute unit. Heterogeneous wavefronts are processed in the same pipeline of the processing device.
-
公开(公告)号:US11487671B2
公开(公告)日:2022-11-01
申请号:US16446119
申请日:2019-06-19
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Xianwei Zhang , John Kalamatianos , Bradford Beckmann
IPC: G06F12/0891 , G06F12/0888 , G06F12/0895 , G06F9/54 , G06F9/38
Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.
-
公开(公告)号:US11144208B2
公开(公告)日:2021-10-12
申请号:US16724609
申请日:2019-12-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: SeyedMohammad Seyedzadehdelcheh , Xianwei Zhang , Bradford Beckmann , Shomit N. Das
IPC: G06K9/36 , G06F3/06 , G06F12/0875 , G06T1/20
Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
-
公开(公告)号:US11042484B2
公开(公告)日:2021-06-22
申请号:US15192542
申请日:2016-06-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Johnathan R. Alsop , Bradford Beckmann
IPC: G06F12/0897 , G06F12/0808 , G06F12/0811 , G06F12/0842 , G06F12/0891
Abstract: A processing system includes one or more first caches and one or more first lock tables associated with the one or more first caches. The processing system also includes one or more processing units that each include a plurality of compute units for concurrently executing work-groups of work items, a plurality of second caches associated with the plurality of compute units and configured in a hierarchy with the one or more first caches, and a plurality of second lock tables associated with the plurality of second caches. The first and second lock tables indicate locking states of addresses of cache lines in the corresponding first and second caches on a per-line basis.
-
公开(公告)号:US20170220346A1
公开(公告)日:2017-08-03
申请号:US15010093
申请日:2016-01-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Bradford Beckmann , Sooraj Puthoor
IPC: G06F9/30
CPC classification number: G06F9/3851 , G06F9/3887 , G06F9/4856
Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.
-
-
-
-
-
-
-