摘要:
A multiply-multiply-accumulate (MMA) system (10) efficiently evaluates matrix products X=F*C. Matrix C is dissected into submatrices A and B taking advantage of symmetry in C. LOG unit (14) converts B, A, and F to LOG values B', A' and F'. These are summed in K parallel calculating units CU's (18) and converted back to Normal domain as P=F*B*A in ALOG units (22) and sent to accumulators ACU's (24). The ACU's (24) accumulate the results. An output buffer (26) combines the results. The B', A' values (32,34) are held in a cache memory (20) and the LOG sums are performed in two steps with intermediate storage.
摘要翻译:乘法 - 累积(MMA)系统(10)有效地评估矩阵乘积X = F * C。 利用C中的对称性将矩阵C解剖为子矩阵A和B. LOG单元(14)将B,A和F转换为LOG值B',A'和F'。 这些以K并行计算单元CU(18)相加,并在ALOG单元(22)中被转换回正常域为P = F * B * A并发送到累加器ACU(24)。 ACU(24)积累了结果。 输出缓冲器(26)组合结果。 B',A'值(32,34)被保存在高速缓冲存储器(20)中,并且LOG和数以两个步骤执行中间存储。
摘要:
A system (400) alternatively performs real matrix operation in a first mode or performs complex matrix multiplication in a second mode. One input matrix (e.g., {B}) stays in a plurality of memory fields (430-k), while the other input matrix (e.g., {A}) is loaded into a plurality of registers (410-k). Parallel operating groups (405-k, 409-(k+1)) combine elements of {A} with elements of {B}. The groups (405-k, 409-(k+1)) comprise the memory fields (430-k), the registers (410-k) as well as computational units (440-k), switches (420-k) and adder units (460-k). The adder units (460-k) are configured by the switches (420-k) to operate as adders or to operate as accumulators, depending on the mode. Adders provide intermediate results and accumulators accumulate these intermediate results (e.g., Sum) to elements of the resulting matrix {C}. For complex multiplication, real (Re) and imaginary (Im) parts of matrix elements are in processed in adjacent groups. The system (400) uses logarithmic representations of the matrix elements and further comprises a log converter (490) and a plurality of inverse log converters (450-k).
摘要翻译:系统(400)可选地以第一模式执行实矩阵运算或在第二模式中执行复矩阵乘法。 一个输入矩阵(例如{+ E,uns B + EE})停留在多个存储器场(430-k)中,而另一个输入矩阵(例如,{+ E,uns A + EE})被加载到 多个寄存器(410-k)。 并行操作组(405-k,409-(k + 1))将{+ E,uns A + EE}的元素与{+ E,uns B + EE}的元素组合。 组(405-k,409-(k + 1))包括存储器字段(430-k),寄存器(410-k)以及计算单元(440-k),开关(420-k)和 加法器单元(460-k)。 加法器单元(460-k)由开关(420-k)配置,作为加法器运行或者作为累加器运行,这取决于模式。 加法器提供中间结果,并且累加器将这些中间结果(例如,Sum)累积到所得矩阵{+ E,C C + EE}的元素。 对于复数乘法,在相邻组中处理矩阵元素的实数(Re)和虚部(Im)部分。 系统(400)使用矩阵元素的对数表示,并且还包括对数转换器(490)和多个逆对数转换器(450-k)。
摘要:
The invention relates to a method for electronically representing a number V in a binary data word. Both the exponent and the mantissa are represented as 2' complement. The mantissa is normalized to 0.1.F if the number V is positive where F is the fraction of the mantissa. In case that the number V is negative the fraction F is normalized to 10.F. Usage of this format allows to design an improved adder which requires less hardware.
摘要:
Parallel signal processor (10) (FIG. 2) performs a Fourier Transformation of an input signal; The transformation coefficients are converted once to logarithmic form and stored in a cache memory. The input data is converted serially to logarithmic form, and fed to all processing units in parallel. The processing units compute their respective products as additions in the logarithmic domain. Then, the products are converted back to the normal domain. The products with the correct sign are summed by an accumulator of the respective processing element. After the last signal data point has run through the processing elements and the last products are added to their respective sums, all complex output signal data points are complete simultaneously.
摘要:
Apparatus and method for providing information to a cache module, the apparatus includes: (i) at least one processor, connected to the cache module, for initiating a first and second requests to retrieve, from the cache module, a first and a second data unit; (ii) logic, adapted to receive the requests and determine if the first and second data units are mandatory data units; and (iii) a controller, connected to the cache module, adapted to initiate a single fetch burst if a memory space retrievable during the single fetch burst comprises the first and second mandatory data units, and adapted to initiate multiple fetch bursts if a memory space retrievable during a single fetch burst does not comprise the first and the second mandatory data units.
摘要:
A device that includes: a first bus, connected between a first logic and a first circuit; a group of second buses connected between the first logic and between multiple non-high impedance circuit access logics associated with multiple circuits; wherein each circuit access logic is adapted to: (i) provide to the first logic, a circuit write value during a circuit writing period and during an idle period that follows the circuit writing period and ends when another circuit is allowed to write; and (ii) provide a default value when another circuit is allowed to write; and wherein the first logic is adapted to alter a state of the first bus in response to a change between two consecutive circuit write values.
摘要:
A device that includes: a first bus, connected between a first logic and a first circuit; a group of second buses connected between the first logic and between multiple non-high impedance circuit access logics associated with multiple circuits; wherein each circuit access logic is adapted to: (i) provide to the first logic, a circuit write value during a circuit writing period and during an idle period that follows the circuit writing period and ends when another circuit is allowed to write; and (ii) provide a default value when another circuit is allowed to write; and wherein the first logic is adapted to alter a state of the first bus in response to a change between two consecutive circuit write values.
摘要:
A memory cache control arrangement for performing a coherency operation on a memory cache comprises a receive processor for receiving an address group indication for an address group comprising a plurality of addresses associated with a main memory. The address group indication may indicate a task identity and an address range corresponding to a memory block of the main memory. A control unit processes each line of a group of cache lines sequentially. Specifically it is determined if each cache line is associated with an address of the address group by evaluating a match criterion. If the match criterion is met, a coherency operation is performed on the cache line. If a conflict exists between the coherency operation and another memory operation the coherency means inhibits the coherency operation. The invention allows a reduced duration of a cache coherency operation. The duration is further independent of the size of the main memory address space covered by the coherency operation.
摘要:
Apparatus and method for providing information to a cache module, the apparatus includes: (i) at least one processor, connected to the cache module, for initiating a first and second requests to retrieve, from the cache module, a first and a second data unit; (ii) logic, adapted to receive the requests and determine if the first and second data units are mandatory data units; and (iii) a controller, connected to the cache module, adapted to initiate a single fetch burst if a memory space retrievable during the single fetch burst comprises the first and second mandatory data units, and adapted to initiate multiple fetch bursts if a memory space retrievable during a single fetch burst does not comprise the first and the second mandatory data units.