摘要:
One or more client engines issues write transactions to system memory or peer parallel processor (PP) memory across a peripheral component interconnect express (PCIe) interface. The client engines may issue write transactions faster than the PCIe interface can transport those transactions, causing write transactions to accumulate within the PCIe interface. To prevent the accumulation of write transactions within the PCIe interface, an arbiter throttles write transactions received from the client engines based on the number of write transactions currently being transported across the PCIe interface.
摘要:
One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.
摘要:
Generally, the present disclosure concerns systems and methods for shadowing status for a circuit with a shadow unit. In one aspect, a system comprises a first circuit in a first dynamic clock domain of a plurality of dynamic clock domains, a processor configured to execute software instructions to generate a request for a status of the first circuit, and a second circuit coupled to the first circuit and to the processor. The second circuit, outside the first dynamic clock domain, is configured to shadow a status of the first circuit and to respond to the request for the status of the first circuit with the shadowed status.
摘要:
A system and method are provided for moving information between cache coherent memory systems of a partitioned multiprocessor computer system while containing faults to a single partition. The multiprocessor computer system includes a plurality of processors, memory subsystems and input/output (I/O) subsystems that can be divided into a plurality of partitions. Each I/O subsystem includes at least one I/O bridge for interfacing between one or more I/O devices and the multiprocessor system. The I/O bridge has a data mover configured to retrieve information from a “source” partition and to store that information within its own “destination” partition. When activated, the data mover issues a request to the source partition for a non-coherent copy of the information. The home memory subsystem in the source partition preferably responds to the request by sending the data mover “valid”, but non-coherent copy of the information, e.g., a “snapshot” of the information as of the time of the request. Upon receiving the information, the data mover may copy it into the memory subsystem of the destination partition.
摘要:
One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.
摘要:
An anti-starvation interrupt protocol for use in avoiding livelock in a multiprocessor computer system is provided. At least one processor is configured to include first and second control status registers (CSRs). The first CSR buffers information, such as interrupts, received by the processor, while the second CSR keeps track of the priority level of the interrupts. When an interrupt controller receives an interrupt, it issues a write transaction to the first CSR at the processor. If the first CSR has room to accept the write transaction, the processor returns an acknowledgement, whereas if the first CSR is already full, the processor returns a no acknowledgment. In response to a no acknowledgment, the interrupt controller increments an interrupt starvation counter, and checks to see whether the counter exceeds a threshold. If not, the interrupt controller waits a preset time and reposts the write transaction. If it does, the interrupt controller issues a write transaction having a higher priority to the second CSR. In response, the processor copies all of the pending interrupts from the first CSR into the memory subsystem, thereby freeing up the first CSR to accept additional write transactions.
摘要:
A system and method avoids “livelock” and “starvation” among two or more input/output (I/O) devices of a symmetrical multiprocessor (SMP) computer system competing for the same data. The SMP computer system includes a plurality of interconnected processors, one or more memories that are shared by the processors, and a plurality of I/O bridges to which the I/O devices are coupled. A cache coherency protocol is executed the I/O bridges, which requires the I/O bridges to obtain “exclusive” (not shared) ownership of all data stored by the bridges. In response to a request for data currently stored by an I/O bridge, the bridge first copies at least a portion of that data to a non-coherent buffer before invalidating the data. The bridge then takes the largest amount of the data saved in its non-coherent buffer that its knows to be coherent, and releases only that known coherent amount to the I/O device, and then discards all of the saved data.
摘要:
One embodiment of the present invention sets forth a technique for arbitrating between a set of requesters that transmit data transmission requests to the weighted LRU arbiter. Each data transmission request is associated with a specific amount of data to be transmitted over the crossbar unit. Based on the priority state associated with each requester, the weighted LRU arbiter then selects the requester in the set of requesters with the highest priority. The weighted LRU arbiter then decrements the weight associated with the selected requester stored in a corresponding weight store based on the size of the data to be transmitted. If the decremented weight is equal to or less than zero, then the priority associated with the selected requester is set to a lowest priority. If, however, the decremented weight is greater than zero, then the priority associated with the selected requester is not changed.
摘要:
A system that supports a high performance, scalable, and efficient I/O port protocol to connect to I/O devices is disclosed. A distributed multiprocessing computer system contains a number of processors each coupled to an I/O bridge ASIC implementing the I/O port protocol. One or more I/O devices are coupled to the I/O bridge ASIC, each I/O device capable of accessing machine resources in the computer system by transmitting and receiving message packets. Machine resources in the computer system include data blocks, registers and interrupt queues. Each processor in the computer system is coupled to a memory module capable of storing data blocks shared between the processors. Coherence of the shared data blocks in this shared memory system is maintained using a directory based coherence protocol. Coherence of data blocks transferred during I/O device read and write accesses is maintained using the same coherence protocol as for the memory system. Data blocks transferred during an I/O device read or write access may be buffered in a cache by the I/O bridge ASIC only if the I/O bridge ASIC has exclusive copies of the data blocks. The I/O bridge ASIC includes a DMA device that supports both in-order and out-of-order DMA read and write streams of data blocks. An in-order stream of reads of data blocks performed by the DMA device always results in the DMA device receiving coherent data blocks that do not have to be written back to the memory module.
摘要:
A method and system for completing pending I/O device reads by periodically stalling the issuance of I/O device accesses by a program in a multiple-processor computer system.