Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.
Abstract:
An apparatus and method for supporting a multi-mode. The apparatus for supporting a multi-mode may include an instruction distributor configured to select, according to a current execution mode, at least one instruction from among a plurality of received instructions that each include an operand and an opcode, and transfer the opcode included in each of at least one selected instruction to the plurality of functional units; an operand switch controller configured to generate, based on the operand included in each of the selected at least one instruction, switch configuration information for routing in order to execute the selected at least one instruction; and an operand switch configured to route, based on the switch configuration information, a functional unit output or a register file output to either a functional unit input or a register file input.
Abstract:
A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.
Abstract:
A multi-core apparatus includes cores each including an active cycle counting unit configured to store an active cycle count, and a stall cycle counting unit configured to store a stall cycle count. The multi-core apparatus further includes a job scheduler configured to determine an optimal number of cores in an active state based on state information received from each of the cores, and adjust power to maintain the optimal number of cores.
Abstract:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
Abstract:
A processor for batch thread processing includes a central register file, and one or more function unit batches each including two or more function units and one or more ports to access the central register file. The function units of the function unit batches execute an instruction batch including one or more instructions to sequentially execute the one or more instructions in the instruction batch.
Abstract:
A region growing apparatus using multi-core includes a plurality of cores, each core including an operation controller configured to perform an operation for region growing of a 2D pixel region or 3D pixel region and an inner memory configured to store a queue associated with a seed pixel as a target of the operation; and a shared memory connected to the plurality of cores over a network and shared by the plurality of cores.
Abstract:
A functional unit for supporting multithreading, a processor including the same, and an operating method of the processor are provided. The functional unit for supporting multithreading includes a plurality of input ports configured to receive opcodes and operands for a plurality of threads, wherein each of the plurality of input ports is configured to receive an opcode and an operand for a different thread, a plurality of operators configured to perform operations using the received operands, an operator selector configured to select, based on each opcode, an operator from among the plurality of operators to perform a specific operation using an operand from among the received operands, and a plurality of output ports configured to output operation results of operations for each thread.