Abstract:
A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.
Abstract:
Systems and methods pertain to looking up entries of a table. A processor receives one or more single instruction multiple data (SIMD) instructions, including a first SIMD instruction which specifies a first subset of indices. A first subset of table entries is looked up, using a crossbar, with the first subset of indices. A first vector output of the first SIMD instruction is based on whether the outputs of the crossbar belong to a desired subset of table entries. Similarly, second, third, and fourth SIMD instructions specify corresponding second, third, and fourth subsets of indices to lookup the remaining table entries using the crossbar. The size of the crossbar is based on the number of indices in the subset of indices used to lookup table entries.
Abstract:
Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.
Abstract:
A processor includes a vector register configured to load data responsive to a special purpose load instruction. The processor also includes circuitry configured to replicate a selected sub-vector value from the vector register.
Abstract:
Systems and methods relate to efficient memory operations. A single instruction multiple data (SIMD) gather operation is implemented with a gather result buffer located within or in close proximity to memory, to receive or gather multiple data elements from multiple orthogonal locations in a memory, and once the gather result buffer is complete, the gathered data is transferred to a processor register. A SIMD copy operation is performed by executing two or more instructions for copying multiple data elements from multiple orthogonal source addresses to corresponding multiple destination addresses within the memory, without an intermediate copy to a processor register. Thus, the memory operations are performed in a background mode without direction by the processor.
Abstract:
Systems and methods relate to multiply-and-horizontal-reduce operations, implemented in a digital filter, for example. A single instruction multiple data (SIMD) instruction comprising a first vector comprising M + C multiplicand elements, wherein M and C are positive integers and a second vector comprising M + C corresponding multiplier elements, wherein the C multiplier elements have a value of 1, is received. Using M multipliers in a processor, M multiplications of M multiplicand elements with corresponding M multiplier elements which do not include the C multiplier elements whose values are 1, are performed to generate M products. The C multiplicand elements whose corresponding C multiplier elements have values of 1 are added to or vertically accumulated with the M products.
Abstract:
A device includes one or more processors configured to retrieve a first block of data, the data corresponding to array of values arranged along at least a first dimension and a second dimension, to retrieve at least a portion of a second block of the data, and to perform a first hybrid convolution operation that applies a filter across the first block and at least the portion of the second block to generate output data. The output data includes a first accumulated block and at least a portion of a second accumulated block. The one or more processors are also configured to store the first accumulated block as first output data. The portion of the second block is adjacent to the first block along the first dimension and the portion of the second accumulated block is adjacent to the first accumulated block along the second dimension.
Abstract:
A method of constraining data represented in a deep neural network is described. The method includes determining an initial shifting specified to convert a fixed-point input value to a floating-point output value. The method also includes determining an additional shifting specified to constrain a dynamic range during converting of the fixed-point input value to the floating-point output value. The method further includes performing both the initial shifting and the additional shifting together to form a dynamic, range constrained, normalized floating-point output value.
Abstract:
Systems and methods relate to controlling voltage deviations in processing systems. A scheduler receives transactions and to be issued for execution in a pipeline. A voltage deviation that will occur if a particular transaction is executed in the pipeline is estimated before the transaction is issued. Threshold comparators are used to determine if the estimated voltage deviation will exceed specified thresholds to cause voltage overshoots or undershoots. The scheduler is configured to implement one or more corrective measures, such as increasing or decreasing energy in the pipeline, to mitigate possible voltage overshoots or undershoots, before the transaction is issued to be executed in the pipeline.
Abstract:
Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.