Abstract:
A memory management method includes determining a stride value for stride access by referring to a size of two-dimensional (2D) data, and allocating neighboring data in a vertical direction of the 2D data to a plurality of banks that are different from one another according to the determined stride value. Thus, the data in the vertical direction may be efficiently accessed by using a memory having a large data width.
Abstract:
An operation processing apparatus is provided. The operation processing apparatus includes a vector operator and cores. The vector operator processes a vector operation with respect to an instruction that uses the vector operation, and each core includes a scalar operator that processes a scalar operation with respect to an instruction that does not use the vector operation. The vector operator is shared by the cores.
Abstract:
A direct memory access (DMA) controller is provided. The DMA controller includes a processor interface configured to directly receive information representing a first operation sent by a processor to a buffer, and transmit data corresponding to the first operation stored in the buffer to the processor core or record data corresponding to the first operation in the buffer, and a buffer group connected to the processor interface, and including a plurality of buffers.
Abstract:
Provided are an instruction compression apparatus and method for a very long instruction word (VLIW) processor, and an instruction fetching apparatus and method. The instruction compression apparatus includes: an indicator generator configured to generate an indicator code that indicates an issue width of an instruction bundle to be executed in the VLIW processor, and a number of No-Operation (NOP) instruction bundles following the instruction bundle; an instruction compressor configured to compress the instruction bundle by removing at least one of NOP instructions from the instruction bundle and the NOP instruction bundles following the instruction bundle; and an instruction converter configured to include the generated indicator code in the compressed instruction bundle.
Abstract:
Provided are a method and apparatus for processing a convolution operation in a neural network. The apparatus may include a memory, and a processor configured to read, from the memory, one of divided blocks of input data stored in a memory; generate an output block by performing the convolution operation on the one of the divided blocks with a kernel; generate a feature map by using the output block, and write the feature map to the memory.
Abstract:
A computing system is disclosed. The computing system according to one embodiment of the present disclosure comprises: a memory device for storing an application program; a processor for executing a loader for loading data of the application program into a memory space allocated for execution of the application program; a local memory having a width corresponding to the size of a register of the processor; and a constant memory having a width smaller than that of the local memory, wherein, according to the size of constant data included in the application program, the processor loads the constant data into one of the local memory and the constant memory.
Abstract:
An electronic apparatus for performing machine learning a method of machine learning, and a non-transitory computer-readable recording medium are provided. The electronic apparatus includes an operation module configured to include a plurality of processing elements arranged in a predetermined pattern and share data between the plurality of processing elements which are adjacent to each other to perform an operation; and a processor configured to control the operation module to perform a convolution operation by applying a filter to input data, wherein the processor controls the operation module to perform the convolution operation by inputting each of a plurality of elements configuring a two-dimensional filter to the plurality of processing elements in a predetermined order and sequentially applying the plurality of elements to the input data.
Abstract:
Methods and apparatuses are provided for compressing configuration data. The configuration data, which includes control data corresponding to at least one processing unit used in each of a plurality of cycles, is stored. A plurality of processing units of a reconfigurable processor is divided into a plurality of groups. The configuration data is partitioned into a plurality of pieces of sub-configuration data. Each piece of sub-configuration data corresponding to a respective one of the plurality of groups. If a plurality of adjacent cycles include identical control data, the configuration data is compressed by deleting control data of all but one of the plurality of adjacent cycles, for each sub-configuration data.
Abstract:
Provided is a data processing method including the operations of storing, in a register, a first immediate portion included in a first instruction, from among the first immediate portion and a second immediate portion that constitute an immediate value, which is an operand; determining the immediate value by catenating the second immediate portion included in a second instruction with the stored first immediate portion; and performing an operation by using a value indicated by the second instruction and the determined immediate value.
Abstract:
Methods and apparatuses for parallel processing data include reading items of data from a memory by using a memory access address, confirming items of data that have the same memory address from among the read items of data, masking items of data other than one from among the confirmed items of data, generating a correction value by using the confirmed items of data, performing an operation by using the items of data and the correction value, and storing, in the memory, data obtained by operating the data that has not been masked.