Data processing apparatus and method for performing vector processing
Abstract:
A data processing apparatus and method are provided for processing execution threads, where each execution thread specifies at least one instruction. The data processing apparatus has a vector processing unit providing a plurality M of lanes of parallel processing, within each lane the vector processing unit being configured to perform a processing operation on a data element input to that lane for each of one or more input operands. A vector instruction is received that is specified by a group of the execution threads, that vector instruction identifying an associated processing operation and also providing an indication of the data elements of each input operand that are to be subjected to that associated processing operation. Vector merge circuitry then determines, based on that information, a required number of lanes of parallel processing for performing the associated processing operation. If it is determined that the required number of lanes is less than or equal to half the available number of lanes within the vector processing unit, then the vector merge circuitry allocates a plurality of the execution threads of the group to the vector processing unit such that each execution thread in that plurality is allocated different lanes amongst the available lanes of parallel processing. As a result, the vector processing unit then performs the associated processing operation in parallel for each of the plurality of execution threads, significantly increasing performance.
Information query
Patent Agency Ranking
0/0