摘要:
Techniques to suppress redundant reads to register addresses and to replicate read data are disclosed. The redundant reads are suppressed when multiple source operands specify the same register address to read. Additionally, the read data is replicated to a data stream or data location corresponding to the source operands where the data read was suppressed.
摘要:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for constructing and programming quantum hardware for quantum annealing processes.
摘要:
In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.
摘要:
An asynchronous multiple-core processor may be adapted for carrying out sets of known tasks, such as the tasks in the LAPACK and BLAS packages. Conveniently, the known tasks may be handled by the asynchronous multiple-core processor in a manner that may be considered to be more power efficient than carrying out the same known tasks on a single-core processor. Indeed, some of the power savings are realized through the use of token-based single core processors. Use of such token-based single core processors may be considered to be power efficient due to the lack of a global clock tree.
摘要:
An apparatus and method for generating a very long instruction word (VLIW) command that supports predicated execution, and a VLIW processor and method for processing a VLIW are provided herein. The VLIW command includes an instruction bundle formed of a plurality of instructions to be executed in parallel and a single value indicating predicated execution, and is generated using the apparatus and method for generating a VLIW command. The VLIW processor decodes the instruction bundle and executes the instructions, which are included in the decoded instruction bundle, in parallel, according to the value indicating predicated execution.
摘要:
A method for sorting data in an array processor. Each of a first tier of processing elements in the array processor receives data inputs from a load streaming unit. Each of the first tier processing elements compares input data portions received from the load streaming unit, wherein the input data portions are stored for processing in respective queues. The first tier processing elements select one of the input data portions to be an output data portion based on the comparison, and in response to the selection, remove a corresponding queue entry and request next input data from the load streaming unit. Each of the first tier processing elements further provides the output data portion as an input data portion to a second tier processing element that generates output data based on a comparison of output data received from at least two first tier processing elements.
摘要:
We describe a method for using a classical computer to generate a particular sequence of elementary operations (SEO), an instruction set for a quantum computer. Such a SEO will induce a quantum computer to perform a unitary transformation U that we call an Irreps Gen U. This U simultaneously diagonalizes a set of operators Hμ called HYPs (Hermitian Young Projectors) for n particles with d colors or, equivalently, for n qu(d)its. Hμ projects out n particle irrep μ of U(d).
摘要翻译:我们描述一种使用经典计算机来生成基本操作(SEO)的特定序列的方法,这是量子计算机的指令集。 这样的SEO将引起量子计算机执行我们称为Irreps Gen U的单一变换U.U U同时将一组用于具有d个颜色的n个粒子称为HYP(Hermitian Young投影仪)的运算符Hμ对角化,或者相当于n (d)它的 Hμ投影出U(d)的粒子反射率μ。
摘要:
A very long instruction word (VLIW) processor performs efficient processing including extended bits operations, such as processing performed in response to instructions commonly used in image processing, image recognition, and other processing, while preventing scaling up of the circuit. The VLIW processor includes an instruction control unit, a register file unit, and an instruction execution unit. The instruction execution unit includes a plurality of slots, and a state register arranged between the second slot and the third slot to transfer N-bit data between the second and third slots. The VLIW processor stores data output from the third slot into the state register and uses the data, and thus achieves efficient processing including bit-expanded operations, such as processing performed in response to instructions commonly used in image processing, image recognition, and other processing, while preventing scaling up of the circuit.
摘要:
A data normalization system is described herein that represents multiple data types that are common within database systems in a normalized form that can be processed uniformly to achieve faster processing of data on superscalar CPU architectures. The data normalization system includes changes to internal data representations of a database system as well as functional processing changes that leverage normalized internal data representations for a high density of independently executable CPU instructions. Because most data in a database is small, a majority of data can be represented by the normalized format. Thus, the data normalization system allows for fast superscalar processing in a database system in a variety of common cases, while maintaining compatibility with existing data sets.
摘要:
Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described.