Abstract:
A processor chip that provides a dynamically selectable operating mode in which particular sequences of instructions are executed without an external interrupt. The processor chip comprises an architected bit that may be set by software and which enables the external interrupt of the processing system to be dynamically enabled/disabled. When a sequence of instructions constituting a data move operation is being issued, the architected bit is toggled to an interrupt disabled state so that execution of the sequence of instructions occurs without an external interrupt. Following the execution of the sequence of instructions, the architected bit is toggled to an interrupt enabled state, which causes instruction execution to be subjected to external interrupts.
Abstract:
Processor communication registers (PCRs) contained in each processor within a multiprocessor system and interconnected by a specialized bus provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs utilizing communication over the specialized bus, instantly allowing all of the other processors to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the multiprocessor system by providing processor communications to be immediately transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
Abstract:
A processor book designed to support both commercial workloads and technical workloads based on a dynamic or static mechanism of reconfiguring the external wiring interconnect. The processor book is configured as a building block for commercial workload processing systems with external connector buses (ECBs). The processor book is also provided with routing logic to enable to ECBs to be utilized for either book-to-book routing or routing within the same processor book. A table specific wiring scheme is provided for coupling the ECBs running off the chips of one MCM to the chips of the second MCM on the processor book so that the chips of the first MCM are connected directly to the chips of a second MCM that is logically furthest away and vice versa. Once the wiring of the ECBs are completed according to the wiring scheme, the operational and functional characteristics reflect those of a processor book configured for technical workloads.
Abstract:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
Abstract:
A method and processor system that substantially enhances the store gathering capabilities of a store queue entry to enable gathering of a maximum number of proximate-in-time store operations before the entry is selected for dispatch. A counter is provided for each entry to track a time since a last gather to the entry. When a new gather does not occur before the counter reaches a threshold saturation point, the entry is signaled ready for dispatch. By defining an optimum threshold saturation point before the counter expires, sufficient time is provided for the entry to gather a proximate-in-time store operation. The entry may be deemed eligible for selection when certain conditions occur, including the entry becoming full, issuance of a barrier operation, and saturation of the counter. The use of the counter increases the ability of a store queue entry to complete gathering of enough store operations to update an entire cache line before that entry is dispatched to an RC machine.
Abstract:
An integrated circuit, such as a processing unit, includes a substrate and integrated circuitry formed in the substrate. The integrated circuitry includes a processor core that executes instructions, an interconnect interface, coupled to the processor core, that supports communication between the processor core and a system interconnect external to the integrated circuit, and at least a portion of an external communication adapter, coupled to the processor core, that supports input/output communication via an input/output communication link.
Abstract:
A method and apparatus for managing cache lines in a data processing system. A special purpose register is employed in which this register may be manipulated by user code and operating system code to set preferences, such as a level 2 cache management policy preference for an application thread. These preferences may be dynamically set and an arbitration mechanism is employed to best satisfy preferences of multiple threads with a single aggregate preference. Members are represented using a least recently used tree. The least recent used tree has a set of nodes forming a path to member cache lines in a hierarchical structure. A state of a selected node is selectively biased within the set of nodes in the least recently used tree. At least one node on a level below the selected node is eliminated from being selected in managing the cache lines. In this manner, members can be biased against or for selection as victims when replacing cache lines in a cache memory.
Abstract:
A data interconnect and routing mechanism reduces data communication latency, supports dynamic route determination based upon processor activity level/traffic, and implements an architecture that supports scalable improvements in communication frequencies. In one implementation, a data processing system includes at least first through third processing units, data storage coupled to the plurality of processing units, and an interconnect fabric. The interconnect fabric includes at least a first data bus coupling the first processing unit to the second processing unit and a second data bus coupling the third processing unit to the second processing unit so that the first and third processing units can transmit data traffic to the second processing unit. The data processing system further includes a control channel coupling the first and third processing units. The first processing unit requests approval from the third processing unit via the control channel to transmit a data communication to the second processing unit, and the third processing unit approves or delays transmission of the data communication in a response transmitted via the control channel.
Abstract:
A data processing system includes a global promotion facility and a plurality of processing units coupled by an interconnect. At least one processing unit among the plurality of processing units includes one or more second caches having cache arrays in which instructions and operand data are cached, an instruction sequencing unit, an execution unit that executes an acquisition instruction to acquire a promotion bit field within the global promotion facility exclusive of at least one other processing unit, and a promotion cache separate from the one or more second caches. In response to acquisition of the promotion bit field by the first processor, the promotion cache of the first processor stores the promotion bit field separately from instructions and operand data.
Abstract:
A processing state that enables a processor to resume processing operations before completion of a processor-issued data move operation. The processor executes instructions specifying a data clone operation and delays subsequent instruction execution while waiting for a receipt of an indication that the data clone operation has completed. In response to the instructions, a memory cloner issues a series of naked write operations targeting the destination memory location and tracks receipt of Null responses for each of the naked write operations. When a Null response has been received for each of the naked write operations, the memory cloner transmits an acknowledgment to the processor indicating that the write operation is architecturally complete before the actual data move has completed. In response to receiving the acknowledgment, the processor resumes execution of subsequent instructions in the instruction stream.