Abstract:
A method comprises receiving scene model data including a scene geometry model and a plurality of pixel data describing objects arranged in a scene. The method generates a primary ray based on a selected first pixel data. In the event the primary ray intersects an object in the scene, the method determines primary hit color data and generates a plurality of secondary rays. The method groups the secondary packets and arranges the packets in a queue based on the octant of each direction vector in the secondary ray packet. The method generates secondary color data based on the secondary ray packets in the queue and generates a pixel color based on the primary hit color data, and the secondary color data. The method generates an image based on the pixel color for the pixel data.
Abstract:
An approach that uses a handler to detect asynchronous lock line reservation lost events, and switching tasks based upon whether a condition is true or a mutex lock is acquired is presented. A synergistic processing unit (SPU) invokes a first thread and, during execution, the first thread requests external data that is shared with other threads or processors in the system. This shared data may be protected with a mutex lock or other shared memory synchronization constructs. When requested data is not available, the SPU switches to a second thread and monitors lock line reservation lost events in order to check when the data is available. When the data is available, the SPU switches back to the first thread and processes the first thread's request.
Abstract:
A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location.
Abstract:
Loading software on a plurality of processors is presented. A processing unit (PU) retrieves a file from system memory and loads it into its internal memory. The PU extracts a processor type from the file's header which identifies whether the file should execute on the PU or a synergistic processing unit (SPU). If an SPU should execute the file, the PU DMA's the file to the SPU for execution. In one embodiment, the file is a combined file which includes both PU and SPU code. In this embodiment, the PU identifies one or more section headers included in the file which indicates embedded SPU code within the combined file. In this embodiment, the PU extracts the SPU code from the combined file and DMA's the extracted code to an SPU for execution.
Abstract:
An approach to hiding memory latency in a multi-thread environment is presented. Branch Indirect and Set Link (BISL) and/or Branch Indirect and Set Link if External Data (BISLED) instructions are placed in thread code during compilation at instances that correspond to a prolonged instruction. A prolonged instruction is an instruction that instigates latency in a computer system, such as a DMA instruction. When a first thread encounters a BISL or a BISLED instruction, the first thread passes control to a second thread while the first thread's prolonged instruction executes. In turn, the computer system masks the latency of the first thread's prolonged instruction. The system can be optimized based on the memory latency by creating more threads and further dividing a register pool amongst the threads to further hide memory latency in operations that are highly memory bound.
Abstract:
To alleviate at least some of the costs associated with context switching, addition fields, either with associated Application Program Interfaces (APIs) or coupled to application modules, can be employed to indicate points of light weight context during the operation of an application. Therefore, an operating system can pre-empt applications at points where the context is relatively light, reducing the costs on both storage and bus usage.
Abstract:
A computer system's multiple processors are managed as devices. The operating system accesses the multiple processors using processor device modules loaded into the operating system to facilitate a communication between an application requesting access to a processor and the processor. A device-like access is determined for accessing each one of the processors similar to device-like access for other devices in the system such as disk drives, printers, etc. An application seeking access to a processor issues device-oriented instructions for processing data, and in addition, the application provides the processor with the data to be processed. The processor processes the data according to the instructions provided by the application.
Abstract:
An approach is provided to allow virtual devices that use a plurality of processors in a multiprocessor systems, such as the BE environment. Using this method, a synergistic processing unit (SPU) can either be dedicated to performing a particular function (i.e., audio, video, etc.) or a single SPU can be programmed to perform several functions on behalf of the other processors in the system. The application, preferably running in one of the primary (PU) processors, issues IOCTL commands through device drivers that correspond to SPUs. The kernel managing the primary processors responds by sending an appropriate message to the SPU that is performing the dedicated function. Using this method, an SPU can be virtualized for swapping multiple tasks or dedicated to performing a particular task.
Abstract:
Grouping processors is presented. A processing unit (PU) initiates an application and identifies the application's requirements. The PU assigns one or more synergistic processing units (SPUs) and a memory space to the application in the form of a group. The application specifies whether the task requires shared memory or private memory. Shared memory is a memory space that is accessible by the SPUs and the PU. Private memory, however, is a memory space that is only accessible by the SPUs that are included in the group. When the application executes, the resources within the group are allocated to the application's execution thread. Each group has its own group properties, such as address space, policies (i.e. real-time, FIFO, run-to-completion, etc.) and priority (i.e. low or high). These group properties are used during thread execution to determine which groups take precedence over other tasks.
Abstract:
A system, method and program product for securely saving a program context to a shared memory is presented. A secured program running on an special purpose processor core running in isolation mode is interrupted. The isolated special purpose processor core is included in a heterogeneous processing environment, that includes purpose processors and general purpose processor cores that each access a shared memory. In isolation mode, the special purpose processor core's local memory is inaccessible from the other heterogeneous processors. The secured program's context is securely saved to the shared memory using a random persistent security data. The lines of code stored in the isolated special purpose processor core's local memory are read along with data values, such as register settings, set by the secured program. The lines of code and data values are encrypted using the persistent security data, and the encrypted code lines and data values are stored in the shared memory.