摘要:
An I/O address translation method for specifying relaxed ordering for I/O accesses are provided. With the apparatus and method, storage ordering (SO) bits are provided in an I/O address translation data structure, such as a page table or segment table. These SO bits define the order in which reads and/or writes initiated by an I/O device may be performed. These SO bits are combined with an ordering bit, e.g., the Relaxed Ordering Attribute bit of PCI Express, on the I/O interface. The weaker ordering indicated either in the I/O address translation data structure or in the I/O interface relaxed ordering bit is used to control the order in which I/O operations may be performed.
摘要:
Mechanisms for priority control in resource allocation is provided. With these mechanisms, when a unit makes a request to a token manager, the unit identifies the priority of its request as well as the resource which it desires to access and the unit's resource access group (RAG). This information is used to set a value of a storage device associated with the resource, priority, and RAG identified in the request. When the token manager generates and grants a token to the RAG, the token is in turn granted to a unit within the RAG based on a priority of the pending requests identified in the storage devices associated with the resource and RAG. Priority pointers are utilized to provide a round-robin fairness scheme between high and low priority requests within the RAG for the resource.
摘要:
A mechanism for priority control in resource allocation for low request rate, latency-sensitive units is provided. With this mechanism, when a unit makes a request to a token manager, the unit identifies the priority of its request as well as the resource which it desires to access and the unit's resource access group (RAG). This information is used to set a value of a storage device associated with the resource, priority, and RAG identified in the request. When the token manager generates and grants a token to the RAG, the token is in turn granted to a unit within the RAG based on a priority of the pending requests identified in the storage devices associated with the resource and RAG. Priority pointers are utilized to provide a round-robin fairness scheme between high and low priority requests within the RAG for the resource.
摘要:
Mechanisms for priority control in resource allocation is provided. With these mechanisms, when a unit makes a request to a token manager, the unit identifies the priority of its request as well as the resource which it desires to access and the unit's resource access group (RAG). This information is used to set a value of a storage device associated with the resource, priority, and RAG identified in the request. When the token manager generates and grants a token to the RAG, the token is in turn granted to a unit within the RAG based on a priority of the pending requests identified in the storage devices associated with the resource and RAG. Priority pointers are utilized to provide a round-robin fairness scheme between high and low priority requests within the RAG for the resource.
摘要:
A method, system, apparatus, and article of manufacture for performing cacheline polling utilizing a store and reserve instruction are disclosed. In accordance with one embodiment of the present invention, a first process initially-requests an action to be performed by a second process. A reservation is set at a cacheable memory location via a store operation. The first process reads the cacheable memory location via a load operation to determine whether or not the requested action has been completed by the second process. The load operation of the first process is stalled until the reservation on the cacheable memory location is lost. After the requested action has been completed, the reservation in the cacheable memory location is reset by the second process.
摘要:
Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.
摘要:
Mechanisms for performing decoding of context-adaptive binary arithmetic coding (CABAC) encoded data. The mechanisms receive, in a first single instruction multiple data (SIMD) vector register of the data processing system, CABAC encoded data of a bit stream. The CABAC encoded data includes a value to be decoded and bit stream state information. The mechanisms receive, in a second SIMD vector register of the data processing system, CABAC decoder context information. The mechanisms process the value, the bit stream state information, and the CABAC decoder context information in a non-recursive manner to generate a decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms store, in a third SIMD vector register, a result vector that combines the decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms use the decoded value to generate a video output on the data processing system.
摘要:
A mechanism is provided for efficient communication of producer/consumer buffer status. With the mechanism, devices in a data processing system notify each other of updates to head and tail pointers of a shared buffer region when the devices perform operations on the shared buffer region using signal notification channels of the devices. Thus, when a producer device that produces data to the shared buffer region writes data to the shared buffer region, an update to the head pointer is written to a signal notification channel of a consumer device. When a consumer device reads data from the shared buffer region, the consumer device writes a tail pointer update to a signal notification channel of the producer device. In addition, channels may operate in a blocking mode so that the corresponding device is kept in a low power state until an update is received over the channel.
摘要:
A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.
摘要:
The illustrative embodiments comprise a method, data processing system, and computer program product having a processor unit for processing instructions with loops. A processor unit creates a first group of instructions having a first set of loops and second group of instructions having a second set of loops from the instructions. The first set of loops have a different order of parallel processing from the second set of loops. A processor unit processes the first group. The processor unit monitors terminations in the first set of loops during processing of the first group. The processor unit determines whether a number of terminations being monitored in the first set of loops is greater than a selectable number of terminations. In response to a determination that the number of terminations is greater than the selectable number of terminations, the processor unit ceases processing the first group and processes the second group.