摘要:
Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems. In one aspect, a method includes the actions of receiving a request to process a neural network using a processing system that performs neural network computations using fixed point arithmetic; for each node of each layer of the neural network, determining a respective scaling value for the node from the respective set of floating point weight values for the node; and converting each floating point weight value of the node into a corresponding fixed point weight value using the respective scaling value for the node to generate a set of fixed point weight values for the node; and providing the sets of fixed point floating point weight values for the nodes to the processing system for use in processing inputs using the neural network.
摘要:
In a shift and shift-out detecting circuit, a plurality of partial shift circuits respectively have bit shift quantities which are different from each other, and are connected in series. Each of the plurality of partial shift circuits receives a shift result as a previous shift result from the partial shift circuit of a previous stage and a corresponding shift instruction, shifts the previous shift result by the corresponding bit shift quantity in response to the shift instruction to produce a current shift result, and outputs the current shift result to the partial shift circuit of a subsequent stage. A plurality of shift-out detecting circuits are respectively provided for the plurality of partial shift circuits. Each of the plurality of shift-out detecting circuits detects a shift-out of "1" bit from the current shift result and the corresponding shift instruction and generates a partial sticky signal when the shift-out is detected. A collecting circuit collects the partial sticky signals from the plurality of shift-out detecting circuits and generates a sticky signal to indicate generation of the shift-out.
摘要:
In a floating-point arithmetic unit for performing floating-point arithmetic of first and second input data which are represented by a floating-point representation and composed of first and second exponent parts and first and second mantissa parts, a shift amount calculating circuit comprises first and second subtracters (26, 27) supplied with lower (n + 1) bits of the first and the second exponent parts. The first subtracter subtracts a first lower number (#EA1) from a second lower number (#EB1) to produce a first difference signal (RS1). The second subtracter subtracts the second lower number (#EB1) from the first lower number (#EA1) to produce a second difference signal (RS1). Supplied with the first and the second exponent parts, an exponent comparing unit (28) compares the first exponent part with the second exponent part to produce a comparison result signal (CP1, CP2, CP3, CP4). Responsive to the comparison result signal, a first selector (31) selects one of the first difference signal and first and second value signals ("0", "64") as a first right-shift amount signal (SD1). Responsive to the comparison result signal, a second selector (32) selects one of the second difference signal and the first and the second value signals as a second right-shift amount signal (SD2).
摘要:
A method and system are disclosed for performing a leading 0/1 anticipation (LZA) in parallel with the floating-point addition of two operands (A and B) in a computer to significantly reduce the Addition-Normalization time. A combinational network is used to process appropriate XOR (P), AND (G), and NOR (Z) state signals resulting from the comparison of the bits in corresponding bit positions of the operands (A and B), starting with the most significant bit (MSB) side of the addition. The state of the initial state signal is detected and shift amount signals are produced and counted for each successive state signal detected, as long as the state remains TRUE. When the state becomes NOT TRUE, adjustments are made depending on the initial state and the successive state, and production of the shift amount signals is halted and an adjustment signal is produced. To determine the exponent of the sum of the floating-point addition, the shift amount count is summed with the adjustment signal. The latter sum will be the exponent of the sum of the operands thus providing a normalized result. The adjustment signal may be based on the CARRY at the NOT TRUE bit position, and the state at the NOT TRUE position may be used to determine whether the result of the addition is positive or negative. In addition to a serial network, an implementing network of a parallel form which accepts appropriate state inputs as blocks of n bits in length, is disclosed, along with certain special implementation.
摘要:
In a total sum calculation circuit for use in calculating a total sum of first through n-th input data which are represented by a floating point representation and which are composed of first through n-th exponent parts and first through n-th fraction parts, where n is an integer greater than two, an n-input data comparison circuit (32) simultaneously compares the first through the n-th exponent parts with one another to produce a maximum one of the first through the n-th exponent parts and a comparison result signal representative of which one of the first through the n-th exponent parts is the maximum exponent part. Supplied with the first through the n-th exponent parts and the comparison result signal, a shift number calculation circuit (33) calculates first through n-th shift digit numbers between the maximum exponent part and the first through the n-th exponent parts. The first through the n-th fraction parts are shifted by first through n-th shift digit numbers in first through n-th shifters (411-41n) are produced as first through n-th shifted fraction parts which are summed up into an unnormalized fraction part. The unnormalized fraction part is normalized into a total sum fraction part by the use of normalization information derived from the unnormalized fraction part. The maximum exponent part is also normalized by the normalization information into a total sum exponent part. A combination of the total sum exponent part and the total sum fraction part is produced as the total sum represented by the floating point representation.
摘要:
A shift control circuit comprising an arithmetic circuit (20) for producing a string of a predetermined number of data bits, a logic circuit (22) for detecting the positive or negative sign of the bit string and producing a first switch signal responsive to the positive sign of the bit string or a second switch signal responsive to the negative sign of the bit string, a ones complement generator circuit (24) for producing a signal representative of the ones complement of the bit string, a first selective signal transfer circuit (26) such as a multiplexer which is transparent directly to the bit string in response to the first switch signal or to the signal from the ones complement generator circuit in response to the second switch signal, a decoder circuit (28) for decording the bit string or the signal passed through the first selective signal transfer circuit for producing a decoded output signal, a single-bit shifter circuit (30) for shifting the bit of the decoded output signal by a single bit in a predetermined direction for producing a single-bit shifted output signal, and a second selective signal transfer circuit (32) such as a multiplexer which is transparent directly to the decoded output signal in response to the first switch signal or to the signal from the single-bit shifter circuit (30) in response to the second switch signal.
摘要:
Eine in Rechenanlagen häufig benötigte arithmetische Operation ist das Schieben von Zahlenworten. Die hierzu dienende Rechenwerkeinheit weist gemäß der Erfindung ein Schiebewerkfeld auf, das als dreieckförmige Matrix mit n (n + 1 )/2 Tristate-Elementen in n Spalten und n Zeilen für ein n-stelliges Zahlwort aufgebaut ist und mit dessen Hilfe in einem Schritt mit vom Umfang der Verschiebung unabhängiger, kurzer Dauer das Zahlwort um jede beliebige Anzahl von Stellen verschoben entnommen werden kann. Mit einem solchen Schiebewerkfeld können auch sowohl Links- als auch Rechtsverschiebungen vorgenommen werden, wenn Ein- und Ausgabeeinheiten vorgesehen sind, die jeweils aus zwei Sätzen zu je n Tristate-Elementen bestehen und das n-stellige Zahlwort jeweils in zueinander spiegelbildlichen Darstellungen aufnehmen.
摘要:
A processor configured to: receive, at a floating-point-input-terminal, an input-block of data comprising a plurality of floating-point numbers (420) each floating-point number (420) comprising a mantissa and an exponent; determine an input-scale-factor (416) based on a previous-input-block-exponent-value associated with a previous-input-block of data; and convert the input-block of data into a fixed-point-block of data (424) in accordance with the input-scale-factor (416), wherein the fixed-point-block of data (424) comprises a plurality of fixed-point-values that can represent the plurality of floating-point numbers within a particular range.