摘要:
Provided is a multi network and method to transmit packets. The multi network includes a mesh network, a tree network, and a network interface connected to the mesh network and the tree network and configured to transmit, through the mesh network and the tree network, a packet generated by a processing unit, of a processing system having plural processing units, at a starting point to a destination point for another processing unit of the processing system and configured to selectively inject the packet into one of the mesh network and the tree network to transmit the packet to the other processing unit.
摘要:
A method of managing a cache includes storing first data of an upper level cache in a lower level cache, predicting a reuse distance level of second data having a same signature as the first data based on access information about the first data, and storing the second data in one of the lower level cache and a main memory based on the predicted reuse distance level of the second data.
摘要:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
摘要:
An apparatus and a job scheduling method are provided. For example, the apparatus is a multi-core processing apparatus. The apparatus and method minimize performance degradation of a core caused by sharing resources by dynamically managing a maximum number of jobs assigned to each core of the apparatus. The apparatus includes at least one core including an active cycle counting unit configured to store a number of active cycles and a stall cycle counting unit configured to store a number of stall cycles and a job scheduler configured to assign at least one job to each of the at least one core, based on the number of active cycles and the number of stall cycles. When the ratio of the number of stall cycles to a number of active cycles for a core is too great, the job scheduler assigns fewer jobs to that core to improve performance.
摘要:
A method of managing graphics data in a graphics processing device may include: receiving a first draw call having a first identifier, generating a first lookup table having the first identifier mapped in association with a first handle value by allocating the first handle value to the first identifier, generating a second lookup table having the first handle value mapped in association with a first graphics state setting value by allocating the first handle value to the first graphics state setting value, wherein the first graphics state setting value corresponds to the first identifier, and performing at least one graphics pipeline operation to process the first draw call by using the first graphics state setting value obtained from the second lookup table.
摘要:
A texture cache architecture facilitates access of compressed texture data in non-power of two formats, such as the Adaptive Scalable Texture Compression (ASTC) codec. In one implementation, the texture cache architecture includes a controller, a first buffer, a second buffer, and a texture decompressor. A first buffer stores one or more blocks of compressed texel data fetched, in response to a first request, from a first texture cache, where the one or more blocks of compressed texel data including at least requested texel data. The second buffer stores decompressed one or more blocks of compressed texel data and provides the decompressed requested texel data as output to a second texture cache. The one or more blocks of compressed texel data stored by the first buffer includes second texel data in addition to the requested texel data.
摘要:
Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.
摘要:
A texture processor includes: a texture cache configured to store textures; a controller configured to determine a texture address corresponding to a requested texture among the stored textures and read a texture corresponding to the texture address from the texture cache; a format converter configured to convert a format of the read texture into another format, based on a degree of texture precision required by a graphics processing unit (GPU); and a texture filter configured to perform texture filtering using the read texture having its format converted into the another format.
摘要:
A cache memory and a method of managing the same are provided. The method of managing a cache memory includes determining whether a number of bits of a data bandwidth stored in a bank is an integer multiple of a number of bits of unit data in data to be stored, storing first unit data, among the data to be stored, in a first region of a first address in the bank in response to the number of bits of the data bandwidth not being the integer multiple of the number of bits of the unit data, and storing part of second unit data, among the data to be stored, in a second region of the first address.
摘要:
Provided is a processor with a data transfer structure that is excellent in performance and efficiency. According to an aspect, the processor may include a plurality of processing elements, a plurality of routers respectively connected to the processing elements, and a plurality of connection links formed between the routers such that data is transferred between the processors via a network.