Abstract:
A system and method for performing computational processing by a systolic array. The systolic array including an array of processing elements (PEs) arranged in rows and columns; logic to perform a horizontal shift operation, wherein the horizontal shift operation is performed across the entire systolic array; and logic to mark columns of PEs as enabled or disabled, wherein the systolic array is horizontally divided into horizontal groups, and wherein when performing the horizontal shift operation, valid data that crosses from a first column of PEs of a first horizontal group to a second column of PEs of a second horizontal group is invalidated, wherein the first horizontal group is adjacent to the second horizontal group.
Abstract:
A logic circuit in a processor including a plurality of input registers, each for storing a vector containing data elements, a coefficient register for storing a vector containing N coefficients, an output register for storing a result vector, and an arithmetic unit configured to: obtain a pattern for selecting N data elements from the plurality of input registers, select a plurality of groups of N data elements from the plurality of input registers in parallel, wherein each group is selected in accordance with the pattern, and wherein each group is shifted with respect to a previous selected group, perform an arithmetic operation between each of the selected groups and the coefficients in parallel, and store results of the arithmetic operations in the output register.
Abstract:
A system and method is provided for executing a conditional branch instruction. The system and method may include a branch predictor to predict one or more instructions that depend on the conditional branch instruction and a branch mis-prediction buffer to store correct instructions that were not predicted by the branch predictor during a branch mis-prediction.
Abstract:
A system and method is provided for executing a conditional branch instruction. The system and method may include a branch predictor to predict one or more instructions that depend on the conditional branch instruction and a branch mis-prediction buffer to store correct instructions that were not predicted by the branch predictor during a branch mis-prediction.
Abstract:
A decoder to search a tree graph to decode a received signal. The tree graph may have a plurality of levels, each level having a plurality of nodes and each node representing a different value of an element of a candidate transmit signal corresponding to the received signal. The decoder may include a first module to execute a branch prediction at each branch node to select one of a plurality of candidate nodes stemming from the branch node that has a smallest distance increment, and a second module, running in parallel to the first module, to evaluate the branch prediction made by the first module at each branch node by computing an accumulated distance of the selected node. If the accumulated distance of the selected node is greater than or equal to a search radius, the first module may override the branch prediction and select an alternative candidate node.
Abstract:
Embodiments of the invention are directed to a system and method for sub-pixel motion estimation for video encoding. The method includes providing a best match between a source frame and a reference frame by generating a plurality of non linear building surfaces, generating, in real time, an estimated matching criteria surface representing a matching criteria between the source frame and the reference frame based on the building surfaces and a plurality of sample points of an actual matching criteria surface and selecting, in real time, a position on the estimated matching criteria surface.
Abstract:
An instruction packet having an extended machine language instruction may include at least a machine language instruction having encoded bits of an operation and a control word including bits of one or more extension fields. The structure and meaning of the extension fields may depend upon the extended machine language instruction. An association between an extension field and a machine language instruction may depend on the relative position of the extension field and the machine language instruction in the instruction packet.
Abstract:
A processor core architecture includes a cluster having at least a register file and predefined functional units having access to the register file. The architecture also includes an interface to one or more arbitrary functional units external to the processor core. The interface is to provide the arbitrary functional units with access to the register file.
Abstract:
A method and system for finding a Kth element in a series of values, including: organizing the series of values in a PDF by counting a number of occurrences of each value of the series of values; organizing the series of values in a CDF that includes adjacent bins of ranges of values, by counting for each bin an accumulated number of occurrences of values of the series of values up to a bin index of that bin; finding in the CDF a bin for which the associated accumulated number of occurrences is a largest accumulated number of occurrences among the accumulated number of occurrences that is smaller than K; and finding the Kth largest element by searching the PDF for the Kth largest element, starting from the found bin index.
Abstract:
A method and system for performing quadrature amplitude modulation (QAM) decoding of a received signal includes finding for each layer a region in a first constellation diagram of the received signal, the region including a portion of the first constellation diagram, the portion having the same size of a second constellation diagram, and a first constellation order of the received signal is higher than a second constellation order of the second constellation diagram; and, for each layer: finding a first portion of bits based on bits that are constant among constellation points located in the region of the layer; decoding the received signal using a QAM decoder having the second constellation order to obtain a second portion of bits; adjusting the second portion of bits based on the region of the layer; and merging the first portion of bits with the second portion of bits to obtain a decoded symbol.