摘要:
A device including processors configured to execute instructions and memories storing the instructions, which when executed by the processors configure the processors to perform an operation for training a transformer model having a plurality of encoders and a plurality of decoders by configuring the processors to identify the batches of training data into a plurality of micro-batches, select layer pairs for the plurality of micro-batches, assemble a processing order of the layer pairs, determining resource information to be allocated to the layer pairs, and allocate resources to the layer pairs based on the determined resource information to be allocated to the layer pairs, dependent con the processing order of the layer pairs.
摘要:
A device and method with batch normalization are provided. An accelerator includes: core modules, each core module including a respective plurality of cores configured to perform a first convolution operation using feature map data and a weight; local reduction operation modules adjacent to the respective core modules, each including a respective plurality of local reduction operators configured to perform a first local operation that obtains first local statistical values of the corresponding core module; a global reduction operation module configured to perform a first global operation that generates first global statistical values of the core module based on the first local statistical values of the core modules; and a normalization operation module configured to perform a first normalization operation on the feature map data based on the first global statistical values.
摘要:
A memory device includes a first memory cell, a second memory cell, a precharge circuit, a sense amplifier, a switch circuit, and a controller. The first memory cell is connected to a first bit line, the second memory cell is connected to a second bit line, and the precharge circuit connected between the first bit line and the second bit line. The sense amplifier includes a first input terminal and a second input terminal. The switch circuit is connected to the first bit line and the first input terminal and to the second bit line and the second input terminal and is configured to control a connection between the first bit line and the first input terminal and a connection between the second bit line and the second input terminal in response to a switch signal. The controller is configured to generate the switch signal in response to a command.
摘要:
A device including processors configured to execute instructions and memories storing the instructions, which when executed by the processors configure the processors to perform an operation for training a transformer model having a plurality of encoders and a plurality of decoders by configuring the processors to identify the batches of training data into a plurality of micro-batches, select layer pairs for the plurality of micro-batches, assemble a processing order of the layer pairs, determining resource information to be allocated to the layer pairs, and allocate resources to the layer pairs based on the determined resource information to be allocated to the layer pairs, dependent con the processing order of the layer pairs.
摘要:
A row hammer prevention circuit for providing a reference address to perform an additional refresh operation includes a history storage circuit configured to store one or more first addresses, each of the first addresses having been provided as the reference address. The row hammer prevention circuit further includes an address storage circuit configured to store a row address corresponding to an active command, a reference address storage circuit configured to store one or more second addresses, and a control circuit configured to provide the reference address in response to a refresh command.
摘要:
Proposed are counter-based selective row hammer refresh apparatus and method for row hammer prevention and, more particularly, proposed are an apparatus and a method for reducing energy consumption of dynamic random access memory (DRAM) by improving counter-based algorithms for solving a row hammer problem when applying a refresh management (RFM) command that is a new command applied to the latest DRAM standards, such as DDR5, LPDDR5, and the like.
摘要:
A memory system and a method for the error correction of memory are disclosed herein. The method for the error correction of memory is performed by a memory system including a plurality of memory chips. The method for the error correction of memory may include reading, by a first ECC engine unit included in each of a plurality of memory chips, a chunk including a plurality of data bursts, first parity bits, and position bits from each of the plurality of memory chips; extracting, by the first ECC engine unit, a single data burst having an error from the plurality of data bursts using the position bits; and performing, by the first ECC engine unit, first error correction using the first parity bit corresponding to the extracted error data burst.
摘要:
An accelerator, an operation method of the accelerator, and an accelerator apparatus including the accelerator are disclosed. The operation method includes receiving one or more workloads assigned by a main processor, performing at least one operation involved with the workloads in an internal memory of the accelerator or in a direct memory access (DMA) configured to control data input to or output from the internal memory, and providing a result of performing the at least one operation.
摘要:
A device and method with transformer model implementation are provided. The electronic device includes a processor configured to perform an inference by implementing a transformer model including a plurality of encoders and a plurality of decoders, and a memory configured to store instructions to be executed by the processor. Each of the encoders and the decoders includes an attention block that determines an attention value. The processor is configured to perform a first sub-softmax tile-wise operation in the attention block, perform a reduction operation to determine an adjustment factor based on a resulting value of the first sub-softmax operation, and perform a second sub-softmax tile-wise operation based on a resulting value of the reduction operation.
摘要:
An accelerator includes: a memory configured to store input data; a plurality of shift buffers each configured to shift input data received sequentially from the memory in each cycle, and in response to input data being stored in each of internal elements of the shift buffer, output the stored input data to a processing element (PE) array; a plurality of backup buffers each configured to store input data received sequentially from the memory and transfer the stored input data to one of the shift buffers; and the PE array configured to perform an operation on input data received from one or more of the shift buffers and on a corresponding kernel.