摘要:
A System and method for hiding memory latency in a multi-thread environment is presented. Branch Indirect and Set Link (BISL) and/or Branch Indirect and Set Link if External Data (BISLED) instructions are placed in thread code during compilation at instances that correspond to a prolonged instruction. A prolonged instruction is an instruction that instigates latency in a computer system, such as a DMA instruction. When a first thread encounters a BISL or a BISLED instruction, the first thread passes control to a second thread while the first thread's prolonged instruction executes. In turn, the computer system masks the latency of the first thread's prolonged instruction. The system can be optimized based on the memory latency by creating more threads and further dividing a register pool amongst the threads to further hide memory latency in operations that are highly memory bound.
摘要:
The present invention renders a triangular mesh for employment in graphical displays. The triangular mesh comprises triangle-shaped graphics primitives. The triangle-shaped graphics primitives represent a subdivided triangular shape. Each triangle-shaped graphics primitive shares defined vertices with adjoining triangle-shaped graphics primitives. These shared vertices are transmitted and employed for the rendering of the triangle-shaped graphics primitives.
摘要:
A system and method for partitioning processor resources based on memory usage is provided. A compiler determines the extent to which a process is memory-bound and accordingly divides the process into a number of threads. When a first thread encounters a prolonged instruction, the compiler inserts a conditional branch to a second thread. When the second thread encounters a prolonged instruction, a conditional branch to a third thread is executed. This continues until the last thread conditionally branches back to the first thread. An indirect segmented register file is used so that the “return to” and “branch to” logical registers within each thread are the same (e.g., R1 and R2) for each thread. These logical registers are mapped to hardware registers that store actual addresses. The indirect mapping is altered to bypass completed threads. When the last thread completes it may signal an external process.
摘要:
A system and method for compiling source code for multi-processor environments is presented. Source code is compiled which creates an object file whereby the object file includes multiple object code subtasks. Source code subtasks are compiled into object code subtasks using one of three approaches which are 1) a lowbrow approach, 2) a brute force approach, and 3) a program directive approach. Each object code subtask is formatted to run on a particular processor type with a particular architecture, such as a microprocessor-based architecture or a digital signal processor-based architecture. During runtime, each object code is loaded onto its corresponding processor type for execution.
摘要:
An apparatus and method for efficient communication of producer/consumer buffer status are provided. With the apparatus and method, devices in a data processing system notify each other of updates to head and tail pointers of a shared buffer region when the devices perform operations on the shared buffer region using signal notification channels of the devices. Thus, when a producer device that produces data to the shared buffer region writes data to the shared buffer region, an update to the head pointer is written to a signal notification channel of a consumer device. When a consumer device reads data from the shared buffer region, the consumer device writes a tail pointer update to a signal notification channel of the producer device. In addition, channels may operate in a blocking mode so that the corresponding device is kept in a low power state until an update is received over the channel.
摘要:
A task queue manager manages the task queues corresponding to virtual devices. When a virtual device function is requested, the task queue manager determines whether an SPU is currently assigned to the virtual device task. If an SPU is already assigned, the request is queued in a task queue being read by the SPU. If an SPU has not been assigned, the task queue manager assigns one of the SPUs to the task queue. The queue manager assigns the task based upon which SPU is least busy as well as whether one of the SPUs recently performed the virtual device function. If an SPU recently performed the virtual device function, it is more likely that the code used to perform the function is still in the SPU's local memory and will not have to be retrieved from shared common memory using DMA operations.
摘要:
A program is into at least two object files: one object file for each of the supported processor environments. During compilation, code characteristics, such as data locality, computational intensity, and data parallelism, are analyzed and recorded in the object file. During run time, the code characteristics are combined with runtime considerations, such as the current load on the processors and the size of the data being processed, to arrive at an overall value. The overall value is then used to determine which of the processors will be assigned the task. The values are assigned based on the characteristics of the various processors. For example, if one processor is better at handling intensive computations against large streams of data, programs that are highly computationally intensive and process large quantities of data are weighted in favor of that processor. The corresponding object is then loaded and executed on the assigned processor.
摘要:
A system and method is provided to allow virtual devices that use a plurality of processors in a multiprocessor systems, such as the BE environment. Using this method, a synergistic processing unit (SPU) can either be dedicated to performing a particular function (i.e., audio, video, etc.) or a single SPU can be programmed to perform several functions on behalf of the other processors in the system. The application, preferably running in one of the primary (PU) processors, issues IOCTL commands through device drivers that correspond to SPUs. The kernel managing the primary processors responds by sending an appropriate message to the SPU that is performing the dedicated function. Using this method, an SPU can be virtualized for swapping multiple tasks or dedicated to performing a particular task.
摘要:
A method and system for encrypting and verifying the integrity of a message using a three-phase encryption process is provided. A source having a secret master key that is shared with a target receives the message and generates a random number. The source then generates: a first set of intermediate values from the message and the random number; a second set of intermediate values from the first set of values; and a cipher text from the second set of values. At the three phases, the values are generated using the encryption function of a block cipher encryption/decryption algorithm. The random number and the cipher text are transmitted to the target, which decrypts the cipher text by reversing the encryption process. The target verifies the integrity of the message by comparing the received random number with the random number extracted from the decrypted cipher text.
摘要:
The present invention provides for authenticating a message. A security function is performed upon the message. The message is sent to a target. The output of the security function is sent to the target. At least one publicly known constant is sent to the target. The received message is authenticated as a function of at least a shared key, the received publicly known constants, the security function, the received message, and the output of the security function. If the output of the security function received by the target is the same as the output generated as a function of at least the received message, the received publicly known constants, the security function, and the shared key, neither the message nor the constants have been altered.