Abstract:
A method system for training an apparatus to recognize a pattern includes providing the apparatus with a host processor executing steps of a machine learning process; providing the apparatus with an accelerator including at least two processors; inputting training pattern data into the host processor; determining coefficient changes in the machine learning process with the host processor using the training pattern data; transferring the training data to the accelerator; determining kernel dot-products with the at least two processors of the accelerator using the training data; and transferring the dot-products back to the host processor.
Abstract:
A method system for training an apparatus to recognize a pattern includes providing the apparatus with a host processor executing steps of a machine learning process; providing the apparatus with an accelerator including at least two processors; inputting training pattern data into the host processor; determining coefficient changes in the machine learning process with the host processor using the training pattern data; transferring the training data to the accelerator; determining kernel dot-products with the at least two processors of the accelerator using the training data; and transferring the dot-products back to the host processor.
Abstract:
A method for code compression of a program, the method comprising separating code from data. Software transformations necessary to make address mappings between compressed and uncompressed space are introduced into the code. Statistics are obtained about frequency of occurrence instructions, wherein said statistics include frequency of occurrence of two consecutive instructions. The program is parsed to identify occurrence of instructions or instruction pairs. The identified instructions are replaced with an address to a compressed bus-word table. An address mapping is generated from uncompressed address to compressed addresses.
Abstract:
An architecture for content-aware compression and/or encryption of various segments of a application is disclosed. The architecture advantageously allows decompression and decryption units to be placed various levels of a memory hierarchy.
Abstract:
An architecture for content-aware compression and/or encryption of various segments of a application is disclosed. The architecture advantageously allows decompression and decryption units to be placed various levels of a memory hierarchy.
Abstract:
A new compression and decompression architecture is herein disclosed which advantageously uses a plurality of parallel content addressable memories of different sizes to perform fast matching during compression.
Abstract:
Code compression is known as an effective technique to reduce instruction memory size on an embedded system. However, code compression can also be very effective in increasing the processor-to-memory bandwidth and hence provide increased system performance. A code decompression engine having plurality of dictionary tables, coupled with decoding circuitry and appropriate control circuitry, is coupled between the processor core and the instruction cache. The code decompression engine provides one-cycle decompression of compressed instructions that are intermixed with uncompressed instructions, thereby increasing processor-to-memory bandwidth and avoiding processor stalls.