摘要:
A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.
摘要:
A computer graphics method and apparatus allows designer control over the rendering of objects and scenes, in a rendering system using ray tracing for example. A modeling system is adapted to accept rules for controlling how certain objects affect the appearance of certain other objects. In a ray tracing implementation, rules are specified by ray type and can be specified as either “including” all but certain objects or “excluding” specific objects for any given object. A rendering system extracts these rules from a bytestream or other input including other graphics data and instructions, and populates lists for internal use by other components of the rendering system. A ray tracer in the rendering system is adapted to consult the list when performing ray tracing, so as to enforce the rendering control specified by the content creator when the objects and scene are rendered.
摘要:
A method and an apparatus that partition a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The total number of threads is based on a multi-dimensional value for a global thread number specified in the API. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to either a dimension for a data parallel task associated with the executable codes or a dimension for a multi-dimensional value for a local thread group number. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
摘要:
The present invention relates to computer graphics applications involving scene rendering using objects modeled at multiple levels of detail. In accordance with an aspect of the invention, a ray tracer implementation allows users to specify multiple versions of a particular object, categorized by LOD ID's. A scene server selects the version appropriate for the particular scene, based on the size of the object on the screen for example, and provides a smooth transition between multiple versions of an object model. In one example, the scene server will select two LOD representations associated with a given object and assign relative weights to each representation. The LOD weights are specified to indicate how to blend these representations together. A ray tracer computes the objects hit by camera rays associated with pixels in the camera window, as well as secondary rays in recursive implementations, and rays striking LOD objects are detected and shaded in accordance with the weights assigned to the different representations. Embodiments are disclosed for level of detail control using both forward ray tracing and backward ray tracing, including handling of camera rays, reflected rays, refracted rays and shadow rays.
摘要:
A method, system, computer program product, and protocol for digital rendering over a network is described. Rendering resources associated with a project are stored in a project resource pool at a rendering service site, and for each rendering request received from a client site the project resource pool is compared to current rendering resources at the client site. A given rendering resource is uploaded from the client site to the rendering service only if the project resource pool does not contain the current version, thereby conserving bandwidth. In accordance with a preferred embodiment, redundant generation of raw rendering resource files is avoided by only generating those raw rendering resource files not mated with generated rendering resource files. Methods for reducing redundant generation of raw resources are also described, as well as methods for statistically reducing the number of raw resource files required to be uploaded to the rendering service for multi-frame sessions. The preferred embodiments are particularly advantageous for remote rendering services at a different site from the client and connected across the Internet or a Wide Area Network, but may also be applied where the rendering service is co-located with the client site and connected thereto by a Local Area Network (LAN).
摘要:
The present invention includes a method and apparatus for graphics processing in a handheld device including a transform engine capable of receiving vertex information. The transform engine generates a plurality of vertices from the vertex information, wherein each of the vertices includes a corresponding bin identifier. The method and apparatus further includes view frame factors defining a clipping region such that when any of the plurality of vertices is within the clipping region, a clip identifier is generated for that vertex using the corresponding bin identifier. A vertex shader coupled to a clipping module, wherein the clipping module generates supplemental vertices and the vertex shader receives the supplemental vertices therefrom. The vertex shader combines the supplemental vertices with the bin identifiers and are provided to a vertex buffer.
摘要:
A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.
摘要:
A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
摘要:
A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
摘要:
A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.