摘要:
A system and method for managing position independent code using a software framework is presented. A software framework provides the ability to cache multiple plug-in's which are loaded in a processor's local storage. A processor receives a command or data stream from another processor, which includes information corresponding to a particular plug-in. The processor uses the plug-in identifier to load the plug-in from shared memory into local memory before it is required in order to minimize latency. When the data stream requests the processor to use the plug-in, the processor retrieves a location offset corresponding to the plug-in and applies the plug-in to the data stream. A plug-in manager manages an entry point table that identifies memory locations corresponding to each plug-in and, therefore, plug-ins may be placed anywhere in a processor's local memory.
摘要:
A system and method for loading software on a plurality of processors is presented. A processing unit (PU) retrieves a file from system memory and loads it into its internal memory. The PU extracts a processor type from the file's header which identifies whether the file should execute on the PU or a synergistic processing unit (SPU). If an SPU should execute the file, the PU DMA's the file to the SPU for execution. In one embodiment, the file is a combined file which includes both PU and SPU code. In this embodiment, the PU identifies one or more section headers included in the file which indicates embedded SPU code within the combined file. In this embodiment, the PU extracts the SPU code from the combined file and DMA's the extracted code to an SPU for execution.
摘要:
The present invention renders a triangular mesh for employment in graphical displays. The triangular mesh comprises triangle-shaped graphics primitives. The triangle-shaped graphics primitives represent a subdivided triangular shape. Each triangle-shaped graphics primitive shares defined vertices with adjoining triangle-shaped graphics primitives. These shared vertices are transmitted and employed for the rendering of the triangle-shaped graphics primitives.
摘要:
A system and method for partitioning processor resources based on memory usage is provided. A compiler determines the extent to which a process is memory-bound and accordingly divides the process into a number of threads. When a first thread encounters a prolonged instruction, the compiler inserts a conditional branch to a second thread. When the second thread encounters a prolonged instruction, a conditional branch to a third thread is executed. This continues until the last thread conditionally branches back to the first thread. An indirect segmented register file is used so that the “return to” and “branch to” logical registers within each thread are the same (e.g., R1 and R2) for each thread. These logical registers are mapped to hardware registers that store actual addresses. The indirect mapping is altered to bypass completed threads. When the last thread completes it may signal an external process.
摘要:
A System and method for hiding memory latency in a multi-thread environment is presented. Branch Indirect and Set Link (BISL) and/or Branch Indirect and Set Link if External Data (BISLED) instructions are placed in thread code during compilation at instances that correspond to a prolonged instruction. A prolonged instruction is an instruction that instigates latency in a computer system, such as a DMA instruction. When a first thread encounters a BISL or a BISLED instruction, the first thread passes control to a second thread while the first thread's prolonged instruction executes. In turn, the computer system masks the latency of the first thread's prolonged instruction. The system can be optimized based on the memory latency by creating more threads and further dividing a register pool amongst the threads to further hide memory latency in operations that are highly memory bound.
摘要:
A system and method for compiling source code for multi-processor environments is presented. Source code is compiled which creates an object file whereby the object file includes multiple object code subtasks. Source code subtasks are compiled into object code subtasks using one of three approaches which are 1) a lowbrow approach, 2) a brute force approach, and 3) a program directive approach. Each object code subtask is formatted to run on a particular processor type with a particular architecture, such as a microprocessor-based architecture or a digital signal processor-based architecture. During runtime, each object code is loaded onto its corresponding processor type for execution.
摘要:
An apparatus and method for efficient communication of producer/consumer buffer status are provided. With the apparatus and method, devices in a data processing system notify each other of updates to head and tail pointers of a shared buffer region when the devices perform operations on the shared buffer region using signal notification channels of the devices. Thus, when a producer device that produces data to the shared buffer region writes data to the shared buffer region, an update to the head pointer is written to a signal notification channel of a consumer device. When a consumer device reads data from the shared buffer region, the consumer device writes a tail pointer update to a signal notification channel of the producer device. In addition, channels may operate in a blocking mode so that the corresponding device is kept in a low power state until an update is received over the channel.
摘要:
A system and method for terrain rendering using a limited memory footprint is presented. A system and method to perform vertical ray terrain rendering by using a terrain data subset for image point value calculations. Terrain data is segmented into terrain data subsets whereby the terrain data subsets are processed in parallel. A bottom view ray intersects the terrain data to provide a memory footprint starting point. In addition, environmental visibility settings provide a memory footprint ending point. The memory footprint starting point, the memory footprint ending point, and vertical ray adjacent data points define a terrain data subset that corresponds to a particular vertical ray. The terrain data subset includes height and color information which are used for vertical ray coherence terrain rendering.
摘要:
A task queue manager manages the task queues corresponding to virtual devices. When a virtual device function is requested, the task queue manager determines whether an SPU is currently assigned to the virtual device task. If an SPU is already assigned, the request is queued in a task queue being read by the SPU. If an SPU has not been assigned, the task queue manager assigns one of the SPUs to the task queue. The queue manager assigns the task based upon which SPU is least busy as well as whether one of the SPUs recently performed the virtual device function. If an SPU recently performed the virtual device function, it is more likely that the code used to perform the function is still in the SPU's local memory and will not have to be retrieved from shared common memory using DMA operations.
摘要:
A system and method for balancing computational load across a plurality of processors. Source code subtasks are compiled into byte code subtasks whereby the byte code subtasks are translated into processor-specific object code subtasks at runtime. The processor-type selection is based upon one of three approaches which are 1) a brute force approach, 2) higher-level approach, or 3) processor availability approach. Each object code subtask is loaded in a corresponding processor type for execution. In one embodiment, a compiler stores a pointer in a byte code file that references the location of a byte code subtask. In this embodiment, the byte code subtask is stored in a shared library and, at runtime, a runtime loader uses the pointer to identify the location of the byte code subtask in order to translate the byte code subtask.