Method and apparatus for inter-lane thread migration

    公开(公告)号:US10409610B2

    公开(公告)日:2019-09-10

    申请号:US15010093

    申请日:2016-01-29

    Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.

    DATA REMAPPING FOR HETEROGENEOUS PROCESSOR
    12.
    发明申请
    DATA REMAPPING FOR HETEROGENEOUS PROCESSOR 审中-公开
    异构处理器的数据重新取代

    公开(公告)号:US20150106587A1

    公开(公告)日:2015-04-16

    申请号:US14055221

    申请日:2013-10-16

    Abstract: A processor remaps stored data and the corresponding memory addresses of the data for different processing units of a heterogeneous processor. The processor includes a data remap engine that changes the format of the data (that is, how the data is physically arranged in segments of memory) in response to a transfer of the data from system memory to a local memory hierarchy of an accelerated processing module (APM) of the processor. The APM's local memory hierarchy includes an address remap engine that remaps the memory addresses of the data at the local memory hierarchy so that the data can be accessed by routines at the APM that are unaware of the data remapping. By remapping the data, and the corresponding memory addresses, the APM can perform operations on the data more efficiently.

    Abstract translation: 处理器重新映射异构处理器的不同处理单元的存储数据和相应的数据存储器地址。 处理器包括响应于数据从系统存储器传输到加速处理模块的本地存储器层级而改变数据格式(即,数据在存储器段中物理布置的方式)的数据重映射引擎 (APM)。 APM的本地存储器层次结构包括地址重映射引擎,其重映射本地存储器层级上的数据的存储器地址,使得可以通过APM的不知道数据重映射的例程来访问数据。 通过重新映射数据和相应的存储器地址,APM可以更有效地对数据执行操作。

    Lookup Table (LUT) Vector Instruction
    13.
    发明公开

    公开(公告)号:US20240329984A1

    公开(公告)日:2024-10-03

    申请号:US18128963

    申请日:2023-03-30

    CPC classification number: G06F9/30036 G06F9/3001 G06F9/30109

    Abstract: An electronic device includes processing circuitry that executes a lookup table (LUT) vector instruction. Executing the lookup table vector instruction causes the processing circuitry to acquire a set of reference values by using each input value from an input vector as an index to acquire a reference value from a reference vector. The processing circuitry then provides the set of reference values for one or more subsequent operations. The processing circuitry can also use the set of reference values for controlling vector elements from among a set of vector elements for which a vector operation is performed.

    GPU cache management based on locality type detection

    公开(公告)号:US11487671B2

    公开(公告)日:2022-11-01

    申请号:US16446119

    申请日:2019-06-19

    Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.

    Data compression system using base values and methods thereof

    公开(公告)号:US11144208B2

    公开(公告)日:2021-10-12

    申请号:US16724609

    申请日:2019-12-23

    Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.

    Targeted per-line operations for remote scope promotion

    公开(公告)号:US11042484B2

    公开(公告)日:2021-06-22

    申请号:US15192542

    申请日:2016-06-24

    Abstract: A processing system includes one or more first caches and one or more first lock tables associated with the one or more first caches. The processing system also includes one or more processing units that each include a plurality of compute units for concurrently executing work-groups of work items, a plurality of second caches associated with the plurality of compute units and configured in a hierarchy with the one or more first caches, and a plurality of second lock tables associated with the plurality of second caches. The first and second lock tables indicate locking states of addresses of cache lines in the corresponding first and second caches on a per-line basis.

    METHOD AND APPARATUS FOR INTER-LANE THREAD MIGRATION

    公开(公告)号:US20170220346A1

    公开(公告)日:2017-08-03

    申请号:US15010093

    申请日:2016-01-29

    CPC classification number: G06F9/3851 G06F9/3887 G06F9/4856

    Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.

Patent Agency Ranking