Abstract:
A system of interconnected chips comprising a multi-chip module (MCM) includes a first processor chip, a second processor chip, and an MCM package configured to include the first processor chip, the second processor chip, and an interconnect circuit. The first processor chip is configured to include a first ground-referenced single-ended signaling (GRS) interface circuit. A first set of electrical traces fabricated within the MCM package and configured to couple the first GRS interface circuit to the interconnect circuit. The second processor chip is configured to include a second GRS interface circuit. A second set of electrical traces fabricated within the MCM package and configured to coupled the second GRS interface circuit to the interconnect circuit.
Abstract:
A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
Abstract:
The disclosure provides an improved transimpedance amplifier (TIA) that can operate at a higher bandwidth and lower noise compared to conventional TIAs. The TIA employs a data path with both feedback impedance and feedback capacitance for improved performance. The feedback impedance includes at least two resistors in series and at least one shunt capacitor, coupled between the at least two resistors, that helps to extend the circuit bandwidth and improve SNR at the same time. The capacitance value of the shunt capacitor can be selected based on both the bandwidth and noise. In one example, the TIA includes: (1) a biasing path, and (2) a data path, coupled to the biasing path, including multiple inverter stages and at least one feedback capacitance coupled across an even number of the multiple inverter stages. An optical receiver and a circuit having the TIA are also disclosed.
Abstract:
A balanced, charge-recycling repeater link is disclosed. The link includes a first set of segments operating in a first voltage domain and a second set of segments operating in a second voltage domain. The link is configured to transmit a first signal over at least one segment in the first set of segments and at least one other segment in the second set of segments. Each segment of the link includes at least one active circuit element configured to charge or discharge one or more corresponding interconnects within the link and a level shifter configured to shift the level of a signal on a last interconnect of the segment from the first voltage domain to the second voltage domain or the second voltage domain to the first voltage domain.
Abstract:
A system is provided for transmitting signals. The system comprises a first processing unit, a cache memory, and a package. The first processing unit comprises a first ground-referenced single-ended signaling (GRS) interface circuit and the second processing unit comprises a second GRS interface circuit. The cache memory comprises a third and a fourth GRS interface circuit. The package comprises one or more electrical traces that couple the first GRS interface to the third GRS interface and couple the second GRS interface to the fourth GRS interface, where the first GRS interface circuit, the second GRS interface, the third GRS interface, and the fourth GRS interface circuit are each configured to transmit a pulse along one trace of the one or more electrical traces by discharging a capacitor between the one trace and a ground network.
Abstract:
A system is provided for transmitting signals. The system comprises a first processing unit, a cache memory, and a package. The first processing unit comprises a first ground-referenced single-ended signaling (GRS) interface circuit and the second processing unit comprises a second GRS interface circuit. The cache memory comprises a third and a fourth GRS interface circuit. The package comprises one or more electrical traces that couple the first GRS interface to the third GRS interface and couple the second GRS interface to the fourth GRS interface, where the first GRS interface circuit, the second GRS interface, the third GRS interface, and the fourth GRS interface circuit are each configured to transmit a pulse along one trace of the one or more electrical traces by discharging a capacitor between the one trace and a ground network.
Abstract:
A system of interconnected chips comprising a multi-chip module (MCM) includes a first processor chip, a system function chip, and an MCM package configured to include the first processor chip and the system function chip. The first processor chip is configured to include a first ground-referenced single-ended signaling (GRS) interface circuit. The system function chip is configured to include a second GRS interface circuit. A first set of electrical traces are fabricated within the MCM package and coupled to the first GRS interface circuit and to the second GRS interface circuit. The first GRS interface circuit and second GRS interface circuit together provide a communication channel between the first processor chip and the system function chip.
Abstract:
A system of interconnected chips comprising a multi-chip module (MCM) includes a first processor chip, a second processor chip, and an MCM package configured to include the first processor chip, the second processor chip, and an interconnect circuit. The first processor chip is configured to include a first ground-referenced single-ended signaling (GRS) interface circuit. A first set of electrical traces fabricated within the MCM package and configured to couple the first GRS interface circuit to the interconnect circuit. The second processor chip is configured to include a second GRS interface circuit. A second set of electrical traces fabricated within the MCM package and configured to coupled the second GRS interface circuit to the interconnect circuit.
Abstract:
Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.
Abstract:
Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.