Abstract:
A memory controller is configured to perform one or more write-read-validate operations to calibrate a clock-cycle relationship between the data-strobe signal and the clock signal, wherein the write-read-validate operations involve varying a delay on the data-strobe signal relative to the clock signal by a multiple of a clock period. The phase detector on the memory chip receives signals including a clock signal, a marking signal and a data-strobe signal from the memory controller, wherein the marking signal includes a pulse which marks a specific clock cycle in the clock signal. The phase detector uses the marking signal to window the specific clock cycle in the clock signal, and to use the data-strobe signal to capture the windowed clock signal, thereby creating a feedback signal which is returned to the memory controller to facilitate calibration of the timing relationship.
Abstract:
A cache-coherence protocol distributes atomic operations among multiple processors (or processor cores) that share a memory space. When an atomic operation that includes an instruction to modify data stored in the shared memory space is directed to a first processor that does not have control over the address(es) associated with the data, the first processor sends a request, including the instruction to modify the data, to a second processor. Then, the second processor, which already has control of the address(es), modifies the data. Moreover, the first processor can immediately proceed to another instruction rather than waiting for the address(es) to become available.
Abstract:
Multiple (e.g., four) memory devices on a module are connected to a common pair of differential data strobe signal conductors. The common pair of differential data strobe conductors are also coupled to a memory controller to time the transmission of data to the multiple memory devices and to time the reception of data from the memory devices. The controller calibrates two or more different data transmission delays relative to its transmission of a write data strobe signal on the common pair of differential data strobe conductors. The controller also calibrates to account for two or more different data reception delays (skew) relative to its reception of a read data strobe signal on the common pair of differential data strobe conductors.
Abstract:
Embodiments of a circuit are described. This circuit includes an instruction fetch unit to fetch instructions to be executed which are associated with one or more virtual addresses, a translation lookaside buffer (TLB), and an execution unit to execute the instructions. Moreover, the TLB converts virtual addresses into physical addresses. Note that the TLB includes entries for physical addresses that are dedicated to dynamic random access memory (DRAM) and entries for physical addresses that are dedicated to a memory having a storage cell with a retention time that decreases as operations are performed on the storage cell.
Abstract:
A neural-network accelerator die is stacked on and integrated with a high-bandwidth memory so that the stack behaves as a single, three-dimensional (3-D) integrated circuit. The accelerator die includes a high-bandwidth memory (HBM) interface that allows a host processor to store training data and retrieve inference-model and output data from memory. The accelerator die additionally includes accelerator tiles with a direct, inter-die memory interfaces to a stack of underlying memory banks. The 3-D IC thus supports both HBM memory channels optimized for external access and accelerator- specific memory channels optimized for training and inference.
Abstract:
A DRAM includes at least four groups of memory cores and at least four memory access channel interfaces that, in a first mode, each respectively are to receive memory access commands, directed to a corresponding one of the groups of memory cores. One-half of the memory access channel interfaces are to, in a second mode, each respectively receive memory access commands, directed to a corresponding two of four of the groups of memory cores. The memory access channel interfaces to have electrical connection conductors that lie on opposing sides of at least one line of reflectional symmetry from a second one-half of the one-half of the at least four memory access channel interfaces.
Abstract:
Embodiments in the present disclosure pertain to an apparatus and method for segmentation of a memory device. A bit line (100) is comprised of at least two bit line segments (102, 103) separated by a segment switch (101). When accessing memory cells (105) coupled to the bit line segment closest to the sense amplifier (104), the switch is non-conducting. Controlling the switch to be non-conducting electrically isolates the other bit line segment, thereby also electrically isolating the capacitance and resistance inherent to that bit line segment from the sense amplifier. By electrically isolating" the capacitance and resistance from the sense amplifier, self-refresh, refresh, and row activation can be performed with less power consumed and lower access latency.