Abstract:
A method of controlling a multi-core processor includes allocating at least one core of the multi-core processor to at least one process for execution; generating a translation table with respect to the at least one process to translate a logical ID of the at least one core allocated to the at least one process to a physical ID; and controlling the at least one process based on the translation table generated with respect to the at least one process.
Abstract:
A dynamic library profiling method and a dynamic library profiling system including writing a first break point instruction at a start address of a dynamic library function, recording a first event count value that is a process performance management unit (PMU) count when a target process executes the first break point instruction, writing a second break point instruction to a return address of the dynamic library function, and calculating a PMU count generated in a processor core while the dynamic library function is executed, by comparing the recorded first event count value with a second event count value that is a process PMU count when the target process executes the second break point instruction, wherein the process PMU count is a cumulative value of PMU counts generated in the processor core while the target process is executed.
Abstract:
A method and apparatus with scheduling a neural network (NN), which relate to extracting and scheduling priorities of operation sets, are provided. A scheduler may be configured to receive a loop structure corresponding to a NN model, generate a plurality of operation sets based on the loop structure, generate a priority table for the operation sets based on memory benefits of the operation sets, and schedule the operation sets based on the priority table.
Abstract:
Provided are computing devices, each including a plurality of cores, and methods of allocating power to the plurality of cores. The computing device includes: a control core group including a plurality of control cores, the control core group configured to allocate a power budget to processing cores according to an energy management policy and state information of the processing cores, and transmit the allocated power budget to at least one of a lower control core and the processing cores; and a processing core group including at least one or more of the processing cores, the processing core group configured to perform computations based on the power budget allocated by the control core group, and transmit state information of the processing cores to the control core group, the state information of the processing cores having been modified based on the computations performed.
Abstract:
A method of processing data is performed by a computing device including processing hardware and storage hardware, the method including: converting, by the processing hardware, a neural network, stored in the storage hardware, from a first neural network format into a second neural network format; obtaining, by the processing hardware, information about hardware configured to perform a neural network operation for the neural network and obtaining partition information; dividing the neural network in the second neural network format into partitions, wherein the dividing is based on the information about the hardware and the partition information, wherein each partition includes a respective layer with an input thereto and an output thereof; optimizing each of the partitions based on a relationship between the input and the output of the corresponding layer; and converting the optimized partitions into the first neural network format.
Abstract:
Apparatuses and methods for drawing a quantization configuration are disclosed, where A method may include generating genes by cataloging possible combinations of a quantization precision and a calibration method for each of layers of a pre-trained neural network, determining layer sensitivity for each of the layers based on combinations corresponding to the genes, determining priorities of the genes and selecting some of the genes based on the respective priority of the genes, generating progeny genes by performing crossover on the selected genes, calculating layer sensitivity for each of the layers corresponding to a combination of the crossover, and updating one or more of the genes using the progeny genes based on a comparison of layer sensitivity of the genes and layer sensitivity of the progeny genes.
Abstract:
An apparatus includes a processor configured to generate each of intermediate representation codes corresponding to each of a plurality of loop structures obtained that corresponds to a neural network computation based on an input specification file of hardware; schedule instructions included in each of the intermediate representation codes corresponding to the plurality of loop structures; select, based on latency values predicted according to scheduling results of the intermediate representation codes, any one code among the intermediate representation codes; and allocate, based on a scheduling result of the selected intermediate representation code, instructions included in the selected intermediate representation code to resources of the hardware included in the apparatus.
Abstract:
A method and an apparatus for generating a code for a neural network operation are disclosed. The method includes receiving information on hardware configured to perform a neural network operation of the neural network, generating, using a processor, a target mapping model mapping the neural network operation on processing elements available to perform the neural network operation based on the information and a structure of the neural network, and generating a code to configure the hardware to perform the neural network operation based on the target mapping model.
Abstract:
Methods and apparatuses are provided for compressing configuration data. The configuration data, which includes control data corresponding to at least one processing unit used in each of a plurality of cycles, is stored. A plurality of processing units of a reconfigurable processor is divided into a plurality of groups. The configuration data is partitioned into a plurality of pieces of sub-configuration data. Each piece of sub-configuration data corresponding to a respective one of the plurality of groups. If a plurality of adjacent cycles include identical control data, the configuration data is compressed by deleting control data of all but one of the plurality of adjacent cycles, for each sub-configuration data.
Abstract:
A method for verifying an operation of a reconfigurable processor is provided. The method includes generating an random test program using a test description and an architecture description, executing the generated random test program in a reconfigurable processor and in a simulator, and then comparing type of output values in the execution result.