矩阵乘法的运算方法及装置
    1.
    发明申请

    公开(公告)号:WO2023078364A1

    公开(公告)日:2023-05-11

    申请号:PCT/CN2022/129619

    申请日:2022-11-03

    Abstract: 本发明实施例提供了一种矩阵乘法的运算方法及装置,所述运算方法包括:将两个2N比特的浮点型数据分别拆分为对应的符号位、精度位和指数位,以及将四个N比特的整型数据分别拆分为对应的符号位和精度位;通过指数位相加、符号位异或和精度位相乘对所述两个浮点型数据进行矩阵乘法运算,以及通过符号位异或和精度位相乘对所述四个整型数据两两进行矩阵乘法运算,并在所述浮点型数据和所述整型数据的矩阵乘法运算中复用乘法单元和加法单元。在本发明中,通过将不同数据类型的输入数据进行拆分,从而可以在矩阵乘法过程中复用加速器的乘法和加法运算资源,从而大大减少了加速器的芯片面积和降低了成本。

    APPARATUS AND METHOD FOR ENERGY-EFFICIENT AND ACCELERATED PROCESSING OF AN ARITHMETIC OPERATION

    公开(公告)号:WO2023000110A1

    公开(公告)日:2023-01-26

    申请号:PCT/CA2022/051140

    申请日:2022-07-22

    Abstract: An apparatus and a method for accelerated processing of an arithmetic operation. The apparatus comprises an operand pre-arithmetic status register configured to generate a status notification that flags that one of predetermined combinatory conditions between a first operand and a second operand is met; and a modified arithmetic logic unit. The modified arithmetic logic unit comprises an electronic logic circuit configured to, in response to receiving the status notification from the operand pre-arithmetic status register, readdress execution of the arithmetic operation towards an expedited routine within the modified arithmetic logic unit if the status notification comprises one or more flags or to a conventional routine if the status notification is a blank status notification, the expedited routine having less calculation cycles to output an operation result than the conventional routine.

    PROCESSOR UNIT FOR MULTIPLY AND ACCUMULATE OPERATIONS

    公开(公告)号:WO2021111272A1

    公开(公告)日:2021-06-10

    申请号:PCT/IB2020/061262

    申请日:2020-11-30

    Abstract: A processor unit for multiply and accumulate ("MAC") operations is provided, the processor unit comprising: a plurality of MAC units for performing a set of MAC operations, wherein each MAC unit of the plurality of MAC units including an execution unit and a one-write one-read ("1W/1R") register file, wherein the 1W/1R register file having at least one accumulator; and another register file, wherein the execution unit of each MAC unit being configured to perform a subset of MAC operations by computing a product of a set of values received from the another register file and adding the computed product to a content of the at least one accumulator, wherein each MAC unit being configured to perform the subset of MAC operations in a single clock cycle.

    一种用于执行矩阵加/减运算的装置和方法

    公开(公告)号:WO2017185396A1

    公开(公告)日:2017-11-02

    申请号:PCT/CN2016/081117

    申请日:2016-05-05

    Abstract: 本公开提供了一种用于执行矩阵加减运算的装置,其中,包括:存储单元,用于存储矩阵运算指令相关的矩阵数据;寄存器单元,用于存储矩阵运算指令相关的标量数据;控制单元,用于对矩阵运算指令进行译码,并控制矩阵运算指令的运算过程;矩阵运算单元,用于根据译码后的矩阵运算指令,对输入矩阵进行矩阵加减运算操作;其中,所述矩阵运算单元为定制的硬件电路。本公开还提供了一种执行矩阵加减法运算的方法。

    一种用于执行向量合并运算的装置和方法

    公开(公告)号:WO2017185385A1

    公开(公告)日:2017-11-02

    申请号:PCT/CN2016/080963

    申请日:2016-05-04

    Abstract: 一种用于执行向量合并运算的装置,其包括:存储单元,用于存储向量合并运算指令相关的向量数据;寄存器单元,用于存储向量合并运算指令相关的标量数据;控制单元,用于对向量合并运算指令进行译码,并控制向量合并运算指令的运算过程;向量合并单元,用于根据译码后的向量合并运算指令,对两待合并输入向量进行向量合并操作;其中,所述向量合并单元为定制的硬件电路。提供的用于执行向量合并运算的装置和方法,通过定制的硬件电路实现了精简向量合并指令的完整过程,即通过一条精简的向量合并指令即可实现向量合并运算。

    APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COUMPUTATION
    6.
    发明申请
    APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COUMPUTATION 审中-公开
    选择矢量图选择元素的装置和方法

    公开(公告)号:WO2013147869A1

    公开(公告)日:2013-10-03

    申请号:PCT/US2012/031596

    申请日:2012-03-30

    Abstract: An apparatus and method are described for performing a vector reduction. For example, an apparatus according to one embodiment comprises: a reduction logic tree comprised of a set of N- l reduction logic blocks used to perform reduction in a single operation cycle for N vector elements; a first input vector register storing a first input vector communicatively Coupled to the set of reduction logic blocks; a second input vector register storing a second input vector communicatively coupled to the set of reduction logic blocks; a mask register storing a mask value controlling a set of one or more multiplexers, each of the set of multiplexers selecting a value directly from the first input vector register or an output containing a processed value from one of the reduction logic blocks; and an output vector register coupled to outputs of the one or more multiplexers to receive values output passed through by each of the multiplexers responsive to the control signals.

    Abstract translation: 描述了用于执行向量减少的装置和方法。 例如,根据一个实施例的装置包括:还原逻辑树,包括用于对N个向量元素执行单个操作周期的减少的一组N-1个减少逻辑块; 第一输入向量寄存器,其以通信方式存储耦合到所述一组还原逻辑块的第一输入向量; 存储通信地耦合到所述一组减少逻辑块的第二输入向量的第二输入向量寄存器; 屏蔽寄存器,其存储控制一个或多个多路复用器的集合的掩码值,所述多路复用器集合中的每一个直接从所述第一输入向量寄存器选择值,或者包含来自所述还原逻辑块之一的处理值的输出; 以及耦合到所述一个或多个多路复用器的输出的输出矢量寄存器,以响应于所述控制信号接收由所述多路复用器中的每一个通过的值。

    MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS
    7.
    发明申请
    MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS 审中-公开
    具有不同读取和写入掩码的多元素指令

    公开(公告)号:WO2013095659A1

    公开(公告)日:2013-06-27

    申请号:PCT/US2011067248

    申请日:2011-12-23

    Abstract: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.

    Abstract translation: 描述了一种包括从第一寄存器读取第一读取掩码的方法。 该方法还包括从第二寄存器或存储器位置读取第一向量操作数。 该方法还包括对第一向量操作数应用读取掩码以产生用于操作的一组元素。 该方法还包括执行设定元件的操作。 该方法还包括通过产生操作结果的多个实例来创建输出向量。 该方法还包括从第三寄存器读取第一写掩码,第一写掩码不同于第一读掩码。 该方法还包括针对输出向量应用写掩码以产生合成矢量。 该方法还包括将结果矢量写入目的地寄存器。

    MICROPROCESSOR AND METHOD FOR ENHANCED PRECISION SUM-OF-PRODUCTS CALCULATION ON A MICROPROCESSOR
    10.
    发明申请
    MICROPROCESSOR AND METHOD FOR ENHANCED PRECISION SUM-OF-PRODUCTS CALCULATION ON A MICROPROCESSOR 审中-公开
    微处理器和微处理器产品精度计算的微处理器和方法

    公开(公告)号:WO2011063824A1

    公开(公告)日:2011-06-03

    申请号:PCT/EP2009/008522

    申请日:2009-11-30

    Inventor: RAUBUCH, Martin

    Abstract: A microprocessor (10) comprises at least one general-purpose-register (12) arranged to store and provide a number of destination bits to a multiply unit (14); a control unit (18) adapted to provide at least a multiply-high instruction (20) and a multiply-high- and- accumulate instruction (22) to the multiply unit. The multiply unit is further arranged to receive at least a first and a second source operand (24, 26), each having an associated number of source bits and a sum of the associated numbers of source bits exceeding the number of destination bits, connected to a register-extension cache (28) comprising at least one cache entry arranged to store and provide a number of precision-enhancement bits, and adapted to store a destination portion of a result operand in the general-purpose- register and a precision-enhancement portion of the result operand in the cache entry. The result operand is generated by a multiply-high operation when or by a multiply-high-and-accumulate operation depending on the recieved instruction.

    Abstract translation: 微处理器(10)包括至少一个通用寄存器(12),其被布置为将多个目的地位存储并提供给乘法单元(14); 适于向乘法单元提供至少一个乘法高精度指令(20)和一个乘法和累加指令(22)的控制单元(18)。 乘法单元还被布置成接收至少第一和第二源操作数(24,26),每个源操作数具有相关联的数量的源比特,并且相关联的数量的源比特的总和超过目的地比特数,连接到 寄存器扩展高速缓存(28),包括至少一个高速缓存条目,其被布置为存储和提供多个精度增强位,并且适于将结果操作数的目的地部分存储在通用寄存器中,并且精度增强 结果操作数的一部分在缓存条目中。 根据接收到的指令,结果操作数是通过乘法运算产生的,也可以通过乘法和累加运算生成。

Patent Agency Ranking