摘要:
Parallel signal processor (10) (FIG. 2) performs a Fourier Transformation of an input signal; The transformation coefficients are converted once to logarithmic form and stored in a cache memory. The input data is converted serially to logarithmic form, and fed to all processing units in parallel. The processing units compute their respective products as additions in the logarithmic domain. Then, the products are converted back to the normal domain. The products with the correct sign are summed by an accumulator of the respective processing element. After the last signal data point has run through the processing elements and the last products are added to their respective sums, all complex output signal data points are complete simultaneously.
摘要:
With reference to FIG. 1 signal processor (10) for performing transformations of sets of input data points comprises a memory for storing a first half input data points and a second half input data points, an adder unit for pairwise adding one real part of each one first half input data point and a second half input data point and providing adder output data, and a computing unit for performing transformations upon the adder output data. Addition for data reduction and data transformation are carried out simultaneously by different units.
摘要:
In a parallel computer system having N parallel computing units a data pipeline connects all the computing units. In addition the computing units are coupled to a random access memory so that each computing unit is assigned to one column of the memory array. To perform a digital signal processing filter operation the required coefficients are stored in the memory so that one or more different filter operations can be carried out in an interleaved way.
摘要:
An encoding system (400) receives samples and coefficients from a bus (422). The system comprises a plurality of parallel operating memory devices (430-k), registers (435-k), computing units (440-k), and accumulator units (460-k). The system (400) further comprises a parallel-to-serial buffer (470) coupled to the accumulator units (440-k) and a pair generator (480) for providing amplitude/index pairs. The system (400) performs encoding steps such as transforming, quantizing, zigzagging, rate controlling, and run-length coding. Transforming is explained for the example of a Forward Discrete Cosine Transformation (FDCT). According to a method (500) of the present invention, zigzagging (510) occurs prior to transforming (570) and performed only once when transformation coefficients are provided to the memory devices (430-k) in a zigzag arrangement. Quantizing occurs prior to transforming by pre-calculating the coefficients with quantizers. Pair generator (480) performes rate-controlling and run-length encoding (550). (with reference to FIGS. 2 and 9)
摘要:
The invention relates to a method for electronically representing a number V in a binary data word. Both the exponent and the mantissa are represented as 2' complement. The mantissa is normalized to 0.1.F if the number V is positive where F is the fraction of the mantissa. In case that the number V is negative the fraction F is normalized to 10.F. Usage of this format allows to design an improved adder which requires less hardware.
摘要:
A system (400) alternatively performs real matrix operation in a first mode or performs complex matrix multiplication in a second mode. One input matrix (e.g., {B}) stays in a plurality of memory fields (430-k), while the other input matrix (e.g., {A}) is loaded into a plurality of registers (410-k). Parallel operating groups (405-k, 409-(k+1)) combine elements of {A} with elements of {B}. The groups (405-k, 409-(k+1)) comprise the memory fields (430-k), the registers (410-k) as well as computational units (440-k), switches (420-k) and adder units (460-k). The adder units (460-k) are configured by the switches (420-k) to operate as adders or to operate as accumulators, depending on the mode. Adders provide intermediate results and accumulators accumulate these intermediate results (e.g., Sum) to elements of the resulting matrix {C}. For complex multiplication, real (Re) and imaginary (Im) parts of matrix elements are in processed in adjacent groups. The system (400) uses logarithmic representations of the matrix elements and further comprises a log converter (490) and a plurality of inverse log converters (450-k).
摘要翻译:系统(400)可选地以第一模式执行实矩阵运算或在第二模式中执行复矩阵乘法。 一个输入矩阵(例如{+ E,uns B + EE})停留在多个存储器场(430-k)中,而另一个输入矩阵(例如,{+ E,uns A + EE})被加载到 多个寄存器(410-k)。 并行操作组(405-k,409-(k + 1))将{+ E,uns A + EE}的元素与{+ E,uns B + EE}的元素组合。 组(405-k,409-(k + 1))包括存储器字段(430-k),寄存器(410-k)以及计算单元(440-k),开关(420-k)和 加法器单元(460-k)。 加法器单元(460-k)由开关(420-k)配置,作为加法器运行或者作为累加器运行,这取决于模式。 加法器提供中间结果,并且累加器将这些中间结果(例如,Sum)累积到所得矩阵{+ E,C C + EE}的元素。 对于复数乘法,在相邻组中处理矩阵元素的实数(Re)和虚部(Im)部分。 系统(400)使用矩阵元素的对数表示,并且还包括对数转换器(490)和多个逆对数转换器(450-k)。
摘要:
A multiply-multiply-accumulate (MMA) system (10) efficiently evaluates matrix products X=F*C. Matrix C is dissected into submatrices A and B taking advantage of symmetry in C. LOG unit (14) converts B, A, and F to LOG values B', A' and F'. These are summed in K parallel calculating units CU's (18) and converted back to Normal domain as P=F*B*A in ALOG units (22) and sent to accumulators ACU's (24). The ACU's (24) accumulate the results. An output buffer (26) combines the results. The B', A' values (32,34) are held in a cache memory (20) and the LOG sums are performed in two steps with intermediate storage.
摘要翻译:乘法 - 累积(MMA)系统(10)有效地评估矩阵乘积X = F * C。 利用C中的对称性将矩阵C解剖为子矩阵A和B. LOG单元(14)将B,A和F转换为LOG值B',A'和F'。 这些以K并行计算单元CU(18)相加,并在ALOG单元(22)中被转换回正常域为P = F * B * A并发送到累加器ACU(24)。 ACU(24)积累了结果。 输出缓冲器(26)组合结果。 B',A'值(32,34)被保存在高速缓冲存储器(20)中,并且LOG和数以两个步骤执行中间存储。
摘要:
A memory system (3) for storing data messages communicated between a processor unit (13) and a communication module (11), each data message comprising at least one data word, comprises a memory array (4) having a plurality of memory buffers (B0-BM), each buffer for storing a data message, and logic circuitry (24) coupled to the memory array (4). The logic circuitry (24) sets one bit of a data message stored in a memory buffer to a first logic state during a processor unit read access when the processor unit (13) reads a current data message from the memory buffer, and negates the one bit to a second logic state during a communication module write access when the communication module (11) writes a new data message into the memory buffer. When the communication module (11) is to write a new data message to one of the memory buffers, the state of the one bit of the current data message stored in the one memory buffer provides an indication as to whether the new data message will overwrite the current data message, which current data message has not been read by the processor unit (13) or whether the new data message will overwrite the current data message, which current data message has been read by the processor unit (13).
摘要:
A processor system is adapted to carry out a predicate swap instruction of an instruction set to swap, via a data pathway, predicate data in a first predicate data location of a predicate register with data in a corresponding additional predicate data location of a first additional predicate data container and to swap, via a data pathway, predicate data in a second predicate storage location of the predicate register with data in a corresponding additional predicate data location in a second additional predicate data container.
摘要:
A processor system is adapted to carry out a predicate swap instruction of an instruction set to swap, via a data pathway, predicate data in a first predicate data location of a predicate register with data in a corresponding additional predicate data location of a first additional predicate data container and to swap, via a data pathway, predicate data in a second predicate storage location of the predicate register with data in a corresponding additional predicate data location in a second additional predicate data container.