METHOD AND APPARATUS FOR INTER-LANE THREAD MIGRATION

    公开(公告)号:US20170220346A1

    公开(公告)日:2017-08-03

    申请号:US15010093

    申请日:2016-01-29

    CPC classification number: G06F9/3851 G06F9/3887 G06F9/4856

    Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.

    DYNAMIC WAVEFRONT CREATION FOR PROCESSING UNITS USING A HYBRID COMPACTOR
    22.
    发明申请
    DYNAMIC WAVEFRONT CREATION FOR PROCESSING UNITS USING A HYBRID COMPACTOR 有权
    使用混合压缩机处理单元的动态波形创建

    公开(公告)号:US20160239302A1

    公开(公告)日:2016-08-18

    申请号:US14682971

    申请日:2015-04-09

    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.

    Abstract translation: 提出了一种方法,非暂时计算机可读介质和用于在处理单元上的程序代码执行期间重新包装动态波前的处理器,每个动态波前包括多个线程。 如果检测到分支指令,则确定程序代码中跟随相同控制路径的所有波前是否已经到达作为分支指令的压缩点。 如果在执行程序代码时没有检测到分支指令,则确定跟随相同控制路径的所有波前是否已经达到重新收敛点,该再失真点是要由执行分支而不是执行的程序代码段的开始 从前一个分支指令中分支。 如果跟随相同控制路径的所有波前已经到达分支指令或再聚合点,那么动态波前将重新打包所有遵循相同控制路径的线程。

    Dynamic kernel memory space allocation

    公开(公告)号:US11720993B2

    公开(公告)日:2023-08-08

    申请号:US16138708

    申请日:2018-09-21

    CPC classification number: G06T1/60 G06F9/30098 G06F12/02 G06F12/023 G06T1/20

    Abstract: A processing unit includes one or more processor cores and a set of registers to store configuration information for the processing unit. The processing unit also includes a coprocessor configured to receive a request to modify a memory allocation for a kernel concurrently with the kernel executing on the at least one processor core. The coprocessor is configured to modify the memory allocation by modifying the configuration information stored in the set of registers. In some cases, initial configuration information is provided to the set of registers by a different processing unit. The initial configuration information is stored in the set of registers prior to the coprocessor modifying the configuration information.

    APPROACH FOR PROVIDING INDIRECT ADDRESSING IN MEMORY MODULES

    公开(公告)号:US20230205705A1

    公开(公告)日:2023-06-29

    申请号:US17561406

    申请日:2021-12-23

    CPC classification number: G06F12/10

    Abstract: An approach provides indirect addressing support for PIM. Indirect PIM commands include address translation information that allows memory modules to perform indirect addressing. Processing logic in a memory module processes an indirect PIM command and retrieves, from a first memory location, a virtual address of a second memory location. The processing logic calculates a corresponding physical address for the virtual address using the address translation information included with the indirect PIM command and retrieves, from the second memory location, a virtual address of a third memory location. This process is repeated any number of times until one or more indirection stop criteria are satisfied. The indirection stop criteria stop the process when work has been completed normally or to prevent errors. Implementations include the processing logic in the memory module working in cooperation with a memory controller to perform indirect addressing.

    Continuation analysis tasks for GPU task scheduling

    公开(公告)号:US11544106B2

    公开(公告)日:2023-01-03

    申请号:US16846654

    申请日:2020-04-13

    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

    FPGA-BASED PROGRAMMABLE DATA ANALYSIS AND COMPRESSION FRONT END FOR GPU

    公开(公告)号:US20220188493A1

    公开(公告)日:2022-06-16

    申请号:US17118442

    申请日:2020-12-10

    Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

    MEMORY REQUEST PRIORITY ASSIGNMENT TECHNIQUES FOR PARALLEL PROCESSORS

    公开(公告)号:US20210173796A1

    公开(公告)日:2021-06-10

    申请号:US16706421

    申请日:2019-12-06

    Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Patent Agency Ranking