RECONFIGURABLE VIRTUAL GRAPHICS AND COMPUTE PROCESSOR PIPELINE

    公开(公告)号:US20180114290A1

    公开(公告)日:2018-04-26

    申请号:US15331278

    申请日:2016-10-21

    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.

    Method and system for synchronization of workitems with divergent control flow
    42.
    发明授权
    Method and system for synchronization of workitems with divergent control flow 有权
    工作单元与发散控制流同步的方法和系统

    公开(公告)号:US09424099B2

    公开(公告)日:2016-08-23

    申请号:US13672291

    申请日:2012-11-08

    CPC classification number: G06F9/52 G06F9/522

    Abstract: Disclosed methods, systems, and computer program products embodiments include synchronizing a group of workitems on a processor by storing a respective program counter associated with each of the workitems, selecting at least one first workitem from the group for execution, and executing the selected at least one first workitem on the processor. The selecting is based upon the respective stored program counter associated with the at least one first workitem.

    Abstract translation: 公开的方法,系统和计算机程序产品实施例包括通过存储与每个工作项相关联的相应程序计数器来同步处理器上的一组工作项,从组中选择至少一个第一工作以供执行,以及执行所选择的至少 处理器上的第一个工作。 所述选择基于与所述至少一个第一工作项相关联的相应存储的程序计数器。

    Spatial partitioning in a multi-tenancy graphics processing unit

    公开(公告)号:US12205218B2

    公开(公告)日:2025-01-21

    申请号:US17706811

    申请日:2022-03-29

    Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.

    FUSED MULTIMODAL FRAMEWORK FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

    公开(公告)号:US20240424398A1

    公开(公告)日:2024-12-26

    申请号:US18748920

    申请日:2024-06-20

    Abstract: Systems and techniques for generating and animating non-player characters (NPCs) within virtual digital environments are provided. Multimodal input data is received that comprises a plurality of input modalities for interaction with an NPC having a set of body features and a set of facial features. The multimodal input data is processed through one or more neural networks to generate animation sequences for both the body features and facial features of the NPC. Generating such animation sequences includes disentangling the multimodal input data to generate substantially disentangled latent representations, combining these representations with the multimodal input data, and using a large-language model (LLM) to generate speech data for the NPC. Further processing using reverse diffusion generates face vertex displacement data and joint trajectory data based on the combined representation and generated speech data. The face vertex displacement data, joint trajectory data, and speech data are used to produce an animated representation of the NPC, which is then provided to environment-specific adapters to animate the NPC within a virtual digital environment.

    Sparse matrix-vector multiplication

    公开(公告)号:US11995149B2

    公开(公告)日:2024-05-28

    申请号:US17125457

    申请日:2020-12-17

    CPC classification number: G06F17/16 G06F9/30098 G06F15/80

    Abstract: A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of multipliers that concurrently perform multiplication of the nonzero values in slots of the first set of GPRs with the values of the vector in corresponding slots of the second set. Reduced sum circuitry accumulates results from the set of multipliers for rows of the sparse matrix.

    Dynamically adaptable arrays for vector and matrix operations

    公开(公告)号:US11409840B2

    公开(公告)日:2022-08-09

    申请号:US17032314

    申请日:2020-09-25

    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.

    Wave creation control with dynamic resource allocation

    公开(公告)号:US10558499B2

    公开(公告)日:2020-02-11

    申请号:US15794593

    申请日:2017-10-26

    Abstract: Footprints, or resource allocations, of waves within resources that are shared by processor cores in a multithreaded processor are measured concurrently with the waves executing on the processor cores. The footprints are averaged over a time interval. A number of waves are spawned and dispatched for execution in the multithreaded processor based on the average footprint. In some cases, the waves are spawned at a rate that is determined based on the average value of the footprints of waves within the resources. The rate of spawning waves is modified in response to a change in the average value of the footprints of the waves within the resources.

    Graphics processing hardware for using compute shaders as front end for vertex shaders

    公开(公告)号:US10134102B2

    公开(公告)日:2018-11-20

    申请号:US14297290

    申请日:2014-06-05

    Abstract: A GPU is configured to read and process data produced by a compute shader via the one or more ring buffers and pass the resulting processed data to a vertex shader as input. The GPU is further configured to allow the compute shader and vertex shader to write through a cache. Each ring buffer is configured to synchronize the compute shader and the vertex shader to prevent processed data generated by the compute shader that is written to a particular ring buffer from being overwritten before the data is accessed by the vertex shader. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

    MEMORY PROTECTION IN HIGHLY PARALLEL COMPUTING HARDWARE

    公开(公告)号:US20180314579A1

    公开(公告)日:2018-11-01

    申请号:US15582443

    申请日:2017-04-28

    CPC classification number: G06F11/1048 G06F9/3865

    Abstract: Techniques for handling memory errors are disclosed. Various memory units of an accelerated processing device (“APD”) include error units for detecting errors in data stored in the memory (e.g., using parity protection or error correcting code). Upon detecting an error considered to be an “initial uncorrectable error,” the error unit triggers transmission of an initial uncorrectable error interrupt (“IUE interrupt”) to a processor. This IUE interrupt includes information identifying the specific memory unit in which the error occurred (and possible other information about the error). A halt interrupt is generated and transmitted to the processor in response to the data having the error being consumed (i.e., used by an operation such as an instruction or command), which causes the APD to halt operations. If the data having the error is not consumed, then the halt interrupt is never generated (that the error occurred may remain logged, however).

Patent Agency Ranking