摘要:
A method, system, and computer program product for performance-based chip-to-chip stacking are provided in the illustrative embodiments. A first candidate chip is selected from a set of candidate chips for stacking, each candidate chip in the set of candidate chips including an integrated circuit. A part of a 3D performance determinant is activated in the first candidate chip. A value of a performance parameter is measured for a set of operating conditions. A stacked performance value is computed for the first candidate chip using the value. A subset of the set of candidate chips is stacked in a stack, the subset including the first candidate chip, such that a combined value of the performance parameter for the subset when stacked in a first order is within a defined range of values for the performance parameter.
摘要:
Interfacing circuitry for asynchronously transferring data between a high-speed clock domain and a low-speed clock domain is provided. The interfacing circuitry is divided into halves, with one half being synchronized to a first clock and the second half being synchronized to a second clock. The first half and the second half are mirror images of each other. Each half has at least one storage component, such as a register and a flip-flop, for storing a valid bit as well as data, and at least one multiplexer component for gating the storage component. The valid bit is used to control the multiplexer at a receiving half. When transferring from a high-speed clock domain to a low-speed clock domain, the high-speed clock domain may probe the received data and/or the valid bit stored in the low-speed clock domain before the high-speed clock domain sends additional data.
摘要:
A level shifter having a data input node, a first inverter having its input connected to the data input node, a second inverter connected to an output of the first inverter, a data output node, a latch having its output connected to the data output node, a first NFET connected between an input of the latch and a ground potential, and having its gate electrode connected to an output of the second inverter, and a second NFET connected between the data output node and the ground potential, and having its gate electrode connected to the output of the first inverter. The level shifter provides for a conversion of a data signal from a power supply domain of 1.8 volts to one of 3.3 volts.
摘要:
A method and system for providing an eviction protocol within a non-uniform memory access (NUMA) computer system are disclosed. A NUMA computer system includes at least two nodes coupled to an interconnect. Each of the two nodes includes a local system memory. In response to a request for evicting an entry from a sparse directory, an non-intervention writeback request is sent to a node having the modified cache line when the entry is associated with a modified cache line. After the data from the modified cache line has been written back to a local system memory of the node, the entry can then be evicted from the sparse directory. If the entry is associated with a shared line, an invalidation request is sent to all nodes that the directory entry indicates may hold a copy of the line. Once all invalidations have been acknowledged, the entry can be evicted from the sparse directory.
摘要:
A method for avoiding livelocks due to stale exclusive/modified directory entries within a non-uniform memory access (NUMA) computer system is disclosed. A NUMA computer system includes at least two nodes coupled to an interconnect. Each of the two nodes includes a local system memory. In response to an attempt by a processor of a first node to read a cache line at substantially the same time as a processor of a second node attempts to access the same cache line, wherein the cache line has been silently cast out from a cache memory within the second node even though a coherency directory within the node still indicates the cache line is held exclusively in the second node, the processor of the second node is allowed to access the cache line only if the second node is an owning node of the cache line. The processor of the first node is then allowed to access the cache line.
摘要:
A queue includes a data multiplexer having an output and at least two inputs and a plurality of data latches. The data latches include at least a first data latch and a second data latch, which each have a data input and a data output. The data output of the first data latch is coupled to a first input of the data multiplexer, and the output of the data multiplexer is coupled to the data input of the second data latch. A data value to be stored in the queue is received at a second input to the data multiplexer. In response to one or more control signals, the data value is latched into at least one of the first and second data latches, thereby storing the data value in the queue. Depending upon the design of the control logic, the queue can implement either first in, first out (FIFO) or last in, first out (LIFO) behavior.
摘要:
A non-uniform memory access (NUMA) computer system includes first and second processing nodes that are each coupled to a node interconnect. The first processing node includes a system memory and first and second processors that each have a respective one of first and second cache hierarchies, which are coupled for communication by a local interconnect. The second processing node includes at least a system memory and a third processor having a third cache hierarchy. The first cache hierarchy and the third cache hierarchy are permitted to concurrently store an unmodified copy of a particular cache line in a Recent coherency state from which the copy of the particular cache line can be sourced by shared intervention. In response to a request for the particular cache line by the second cache hierarchy, the first cache hierarchy sources a copy of the particular cache line to the second cache hierarchy by shared intervention utilizing communication on only the local interconnect and without communication on the node interconnect.
摘要:
A data processing system is disclosed which includes a first processor having an m-byte data width, an n-byte data bus, where n is less than m, and a second processor electrically coupled to the bus which performs bus transactions utilizing n-byte packets of data. An adaptor is electrically coupled between the first processor and the bus which converts n-byte packets of data input from the bus to m-byte packets of data, and converts m-byte packets of data input from the first processor to n-byte packets of data, thereby enabling the first processor to transmit data to and receive data from the bus utilizing m-byte packets of data. In a second aspect of the present invention, a method and system are provided for arbitrating between two bus masters having disparate bus acquisition protocols. In response to a second bus master asserting a bus request when a first bus master controls the bus, control of the bus is removed from the first bus master. Thereafter, in response to a signal transmitted from an arbitration control unit to the first bus master instructing the first bus master to terminate its bus transactions, control of the bus is granted to the second bus master. In response to the second bus master terminating its bus request, control of the bus is granted to the first bus master and a signal is transmitted from the arbitration control unit to the first bus master acknowledging the grant of control.
摘要:
A non-uniform memory access (NUMA) computer system includes a node interconnect and a plurality of processing nodes that each contain at least one processor, a local interconnect, a local system memory, and a node controller coupled to both a respective local interconnect and the node interconnect. According to the method of the present invention, a communication transaction is transmitted on the node interconnect from a local processing node to a remote processing node. In response to receipt of the communication transaction by the remote processing node, a response including a coherency response field is transmitted on the node interconnect from the remote processing node to the local processing node. In response to receipt of the response at the local processing node, a request is issued on the local interconnect of the local processing node concurrently with a determination of a coherency response indicated by the coherency response field.
摘要:
A data processing system is disclosed which includes a first processor having an m-byte data width, an n-byte data bus, where n is less than m, and a second processor electrically coupled to the bus which performs bus transactions utilizing n-byte packets of data. An adaptor is electrically coupled between the first processor and the bus which converts n-byte packets of data input from the bus to m-byte packets of data, and converts m-byte packets of data input from the first processor to n-byte packets of data, thereby enabling the first processor to transmit data to and receive data from the bus utilizing m-byte packets of data. In a second aspect of the present invention, a method and system are provided for arbitrating between two bus masters having disparate bus acquisition protocols. In response to a second bus master asserting a bus request when a first bus master controls the bus, control of the bus is removed from the first bus master. Thereafter, in response to a signal transmitted from an arbitration control unit to the first bus master instructing the first bus master to terminate its bus transactions, control of the bus is granted to the second bus master. In response to the second bus master terminating its bus request, control of the bus is granted to the first bus master and a signal is transmitted from the arbitration control unit to the first bus master acknowledging the grant of control.