Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.
Abstract:
An apparatus and method for supporting a multi-mode. The apparatus for supporting a multi-mode may include an instruction distributor configured to select, according to a current execution mode, at least one instruction from among a plurality of received instructions that each include an operand and an opcode, and transfer the opcode included in each of at least one selected instruction to the plurality of functional units; an operand switch controller configured to generate, based on the operand included in each of the selected at least one instruction, switch configuration information for routing in order to execute the selected at least one instruction; and an operand switch configured to route, based on the switch configuration information, a functional unit output or a register file output to either a functional unit input or a register file input.
Abstract:
Provided is a processor with a data transfer structure that is excellent in performance and efficiency. According to an aspect, the processor may include a plurality of processing elements, a plurality of routers respectively connected to the processing elements, and a plurality of connection links formed between the routers such that data is transferred between the processors via a network.
Abstract:
A processor for batch thread processing includes a central register file, and one or more function unit batches each including two or more function units and one or more ports to access the central register file. The function units of the function unit batches execute an instruction batch including one or more instructions to sequentially execute the one or more instructions in the instruction batch.
Abstract:
A region growing apparatus using multi-core includes a plurality of cores, each core including an operation controller configured to perform an operation for region growing of a 2D pixel region or 3D pixel region and an inner memory configured to store a queue associated with a seed pixel as a target of the operation; and a shared memory connected to the plurality of cores over a network and shared by the plurality of cores.
Abstract:
The present examples relate to prefetching, and to a cache control device for prefetching and a prefetching method using the cache control device, wherein the cache control device analyzes a memory access pattern of program code, inserts, into the program code, a prefetching command generated by encoding the analyzed access pattern, and executes the prefetching command inserted into the program code in order to prefetch data into a cache, thereby maximizing prefetching efficiency.
Abstract:
Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.
Abstract:
A method of compiling a program to be executed on a multicore processor is provided. The method may include generating an initial solution by mapping a task to a source processing element (PE) and a destination PE, and selecting a communication scheme for transmission of the task from the source PE to the destination PE, approximately optimizing the mapping and communication scheme included in the initial solution, and scheduling the task, wherein the communication scheme is designated in a compiling process.