-
公开(公告)号:US20180113708A1
公开(公告)日:2018-04-26
申请号:US15299420
申请日:2016-10-20
申请人: JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON
发明人: JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON
CPC分类号: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/3016
摘要: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.
-
公开(公告)号:US20180095749A1
公开(公告)日:2018-04-05
申请号:US15283606
申请日:2016-10-03
发明人: THOMAS ELMER
CPC分类号: G06F9/3001 , G06F7/483 , G06F7/485 , G06F7/4876 , G06F7/49915 , G06F7/49936 , G06F7/49952 , G06F7/5443 , G06F9/30014
摘要: A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp−Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to −K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds −ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.
-
13.
公开(公告)号:US20180095728A1
公开(公告)日:2018-04-05
申请号:US15283295
申请日:2016-10-01
申请人: Intel Corporation
CPC分类号: G06F7/5443 , G06F7/4876
摘要: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.
-
公开(公告)号:US20180095722A1
公开(公告)日:2018-04-05
申请号:US15282021
申请日:2016-09-30
发明人: Brent Buchanan , Le Zheng , John Paul Strachan
CPC分类号: G06F7/523 , G06F7/5443 , G06F2207/4802 , G06F2207/4828 , G11C13/0007 , G11C13/0028 , G11C13/004
摘要: In some examples, a method may be performed by a multiply-accumulate circuit. As part of the method a row driver of the multiply-accumulate circuit may drive a row value line based on an input vector bit of an input vector received by the row driver. The row driver may also drive a row line that controls a corresponding memristor according to the input vector bit. The corresponding memristor may store a weight value bit of a weight value to apply to the input vector for a multiply-accumulate operation. The method may further include a sense amplifier generating an output voltage based on a current output from the corresponding memristor and counter circuitry adjusting a counter value that represents a running total of the multiply-accumulate operation based on the row value line, the output voltage generated by the sense amplifier, or a combination of both.
-
公开(公告)号:US20180088908A1
公开(公告)日:2018-03-29
申请号:US15275037
申请日:2016-09-23
IPC分类号: G06F7/544
CPC分类号: G06F7/5443
摘要: A circuit includes a multiplier, an adder, a first result register and a second result register coupled to outputs of the multiplier and the adder, respectively. The circuit further includes: a first selection unit configured to selectively provide, to the multiplier and in response to a first control signal, a first value from a first plurality of values; and a second selection unit configured to selectively provide, to the multiplier and in response to a second control signal, a second value from a second plurality of values. The circuit also includes: a third selection unit configured to selectively provide, to the adder and in response to a third control signal, a third value from a third plurality of values; and a fourth selection unit configured to selectively provide, to the adder and in response to a fourth control signal, a fourth value from a fourth plurality of values.
-
公开(公告)号:US20180046916A1
公开(公告)日:2018-02-15
申请号:US15458837
申请日:2017-03-14
申请人: NVIDIA Corporation
发明人: William J. Dally , Angshuman Parashar , Joel Springer Emer , Stephen William Keckler , Larry Robert Dennison
CPC分类号: G06N3/063 , G06F7/523 , G06F7/5443 , G06F2207/4824 , G06N3/04 , G06N3/0454 , G06N3/082 , G06N3/084
摘要: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.
-
公开(公告)号:US20180032312A1
公开(公告)日:2018-02-01
申请号:US15224176
申请日:2016-07-29
发明人: Craig Hansen , John Moussouris , Alexia Massalin
CPC分类号: G06F7/523 , G06F7/5045 , G06F7/5443 , G06F2207/3828
摘要: A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations is disclosed.
-
公开(公告)号:US09841948B2
公开(公告)日:2017-12-12
申请号:US14824547
申请日:2015-08-12
发明人: Liang-Kai Wang
CPC分类号: G06F7/483 , G06F7/5443
摘要: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.
-
公开(公告)号:US09747110B2
公开(公告)日:2017-08-29
申请号:US14717657
申请日:2015-05-20
申请人: Altera Corporation
发明人: Martin Langhammer
CPC分类号: G06F9/3869 , G06F7/523 , G06F7/5443 , G06F9/3001 , G06F9/30105 , G06F9/3012 , G06F9/3826 , G06F9/3867 , G06F15/80 , G06F2207/3868 , G06F2207/3888 , G06F2207/3892
摘要: Circuitry operating under a floating-point mode or a fixed-point mode includes a first circuit accepting a first data input and generating a first data output. The first circuit includes a first arithmetic element accepting the first data input, a plurality of pipeline registers disposed in connection with the first arithmetic element, and a cascade register that outputs the first data output. The circuitry further includes a second circuit accepting a second data input and generating a second data output. The second circuit is cascaded to the first circuit such that the first data output is connected to the second data input via the cascade register. The cascade register is selectively bypassed when the first circuit is operated under the fixed-point mode.
-
20.
公开(公告)号:US20170199726A1
公开(公告)日:2017-07-13
申请号:US15469919
申请日:2017-03-27
申请人: lntel Corporation
CPC分类号: G06F7/57 , G06F5/01 , G06F5/012 , G06F7/483 , G06F7/49947 , G06F7/49957 , G06F7/5443 , G06F9/30014 , G06F9/3893
摘要: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
-
-
-
-
-
-
-
-
-