Abstract:
Apparatus and method for a modified, balanced throughput data-path architecture is given for efficiently implementing the digital signal processing algorithms of filtering, convolution and correlation in computer hardware, in which both data and coefficient buffers can be implemented as sliding windows. This architecture uses a multiplexer and a data path branch from the Address Generator unit to the multiply-accumulate execution unit. By selecting between the data path of Address Generator to execution unit and the data path of register to execution unit, the unbalanced throughput and multiply-accumulate bubble cycles caused by misaligned addressing on coefficients can be overcome. The modified balanced throughput data-path architecture can achieve a high multiply-accumulate operation rate per cycle in implementing digital signal processing algorithms.
Abstract:
A register file organization is used to support multiple accesses from more than one processor or pipeline. This shared register file is organized for a multiple processor device that includes a high performance (HP) and a low power (LP) core. The shared register file includes separate HP and LP storage units coupled to separate HP and LP write and read ports.
Abstract:
Apparatus and method for a modified, balanced throughput data-path architecture is given for efficiently implementing the digital signal processing algorithms of filtering, convolution and correlation in computer hardware, in which both data and coefficient buffers can be implemented as sliding windows. This architecture uses a multiplexer and a data path branch from the Address Generator unit to the multiply-accumulate execution unit. By selecting between the data path of Address Generator to execution unit and the data path of register to execution unit, the unbalanced throughput and multiply-accumulate bubble cycles caused by misaligned addressing on coefficients can be overcome. The modified balanced throughput data-path architecture can achieve a high multiply-accumulate operation rate per cycle in implementing digital signal processing algorithms.
Abstract:
A register file organization is used to support multiple accesses from more than one processor or pipeline. This shared register file is organized for a multiple processor device that includes a high performance (HP) and a low power (LP) core. The shared register file includes separate HP and LP storage units coupled to separate HP and LP write and read ports.
Abstract:
An apparatus and method are disclosed to implement digital signal processing operations involving multiply-accumulate (MAC) operations, by using a modified balanced data structure and accessing architecture. This architecture maintains a data-path connecting one address generation unit, one register file and one MAC execution unit. The register file has a hierarchical grouping organization of individual registers, which reduces bubble cycles caused by memory misalignments. This architecture uses parallel execution and can achieve two or more MAC operations per cycle.
Abstract:
An apparatus and method are disclosed to implement digital signal processing operations involving multiply-accumulate (MAC) operations, by using a modified balanced data structure and accessing architecture. This architecture maintains a data-path connecting one address generation unit, one register file and one MAC execution unit. The register file has a hierarchical grouping organization of individual registers, which reduces bubble cycles caused by memory misalignments. This architecture uses parallel execution and can achieve two or more MAC operations per cycle.