-
公开(公告)号:US12086066B1
公开(公告)日:2024-09-10
申请号:US18184536
申请日:2023-03-15
申请人: Cornami, Inc.
IPC分类号: G06F12/0813
CPC分类号: G06F12/0813 , G06F2212/1041
摘要: A cache architecture for an array of identical cores arranged in a grid. Each of the cores include interconnections to neighboring cores in the grid, a memory, and an algorithmic logic unit. A first core of the array is configured to receive a memory access request for data from at least one core of the array of cores configured to perform a computational operation. A second core of the array is configured to determine whether the requested data is present in a cache memory via a cache index including addresses in the cache memory. A third core of the array is configured as the cache memory. The memory of the third core is used as the cache memory. An address of the requested data from the cache index is passed to the third core to output the requested data.
-
公开(公告)号:US11977509B2
公开(公告)日:2024-05-07
申请号:US17967173
申请日:2022-10-17
申请人: Cornami, Inc.
发明人: Paul L. Master , Steven K. Knapp , Raymond J. Andraka , Alexei Beliaev , Martin A. Franz , Rene Meessen , Frederick Curtis Furtek
IPC分类号: G06F7/76 , G06F5/01 , G06F7/487 , G06F7/50 , G06F7/52 , G06F7/523 , G06F7/544 , G06F9/30 , G06F9/38 , G06F9/48 , G06F9/54 , G06F15/173 , G06F15/80 , H03K19/21
CPC分类号: G06F15/80 , G06F5/01 , G06F7/487 , G06F7/50 , G06F7/52 , G06F7/523 , G06F7/5443 , G06F9/30098 , G06F9/3856 , G06F9/4881 , G06F9/54 , H03K19/21 , G06F2207/382
摘要: A representative reconfigurable processing circuit and a reconfigurable arithmetic circuit are disclosed, each of which may include input reordering queues; a multiplier shifter and combiner network coupled to the input reordering queues; an accumulator circuit; and a control logic circuit, along with a processor and various interconnection networks. A representative reconfigurable arithmetic circuit has a plurality of operating modes, such as floating point and integer arithmetic modes, logical manipulation modes, Boolean logic, shift, rotate, conditional operations, and format conversion, and is configurable for a wide variety of multiplication modes. Dedicated routing connecting multiplier adder trees allows multiple reconfigurable arithmetic circuits to be reconfigurably combined, in pair or quad configurations, for larger adders, complex multiplies and general sum of products use, for example.
-
公开(公告)号:US11599367B2
公开(公告)日:2023-03-07
申请号:US16752239
申请日:2020-01-24
申请人: CORNAMI, INC.
发明人: Tianfang Liu
摘要: A system and method to compress application control data, such as weights for a layer of a convolutional neural network, is disclosed. A multi-core system for executing at least one layer of the convolutional neural network includes a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix. The compressed weight matrix is formed by matrix factorization and quantization of a floating point value of each weight to a floating point format. A decompression module is operable to obtain an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix. A plurality of cores executes the at least one layer of the convolutional neural network with the approximation of weight values to produce an inference output.
-
公开(公告)号:US20220058199A1
公开(公告)日:2022-02-24
申请号:US17467231
申请日:2021-09-05
申请人: Cornami, Inc.
摘要: Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.
-
公开(公告)号:US20210297361A1
公开(公告)日:2021-09-23
申请号:US16825585
申请日:2020-03-20
申请人: Cornami, Inc.
IPC分类号: H04L12/801 , H04L12/861 , H04L12/823
摘要: A method and system for providing robust streaming of data from a multi-core die is disclosed. The techniques include using a high bandwidth memory (HBM) device as retransmit buffers for large amounts of data to ensure robust communication in relatively high round trip-transmission time (RTT) transmission. Another technique is supporting two or more Ethernet ports between components to both transmit the same data packets on the two ports to insure robustness. Another technique is to use sequence numbers and send data packets from the different ports in a round robin fashion and reorder the packets upon receipt of an external device. Another technique is dynamically adding and removing paths for data packets between devices with multiple ports based on the quality of the path.
-
公开(公告)号:US20210232407A1
公开(公告)日:2021-07-29
申请号:US16752239
申请日:2020-01-24
申请人: CORNAMI, INC.
发明人: Tianfang LIU
摘要: A system and method to compress application control data, such as weights for a layer of a convolutional neural network, is disclosed. A multi-core system for executing at least one layer of the convolutional neural network includes a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix. The compressed weight matrix is formed by matrix factorization and quantization of a floating point value of each weight to a floating point format. A decompression module is operable to obtain an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix. A plurality of cores executes the at least one layer of the convolutional neural network with the approximation of weight values to produce an inference output.
-
公开(公告)号:US20170206183A1
公开(公告)日:2017-07-20
申请号:US15476598
申请日:2017-03-31
申请人: CORNAMI, INC.
发明人: Solomon Harsha , Paul Master
CPC分类号: G06F15/8046 , G06F8/45 , G06Q40/00
摘要: An apparatus, computer-readable medium, and computer-implemented method for parallelization of a computer program on a plurality of computing cores includes receiving a computer program comprising a plurality of commands, decomposing the plurality of commands into a plurality of node networks, each node network corresponding to a command in the plurality of commands and including one or more nodes corresponding to execution dependencies of the command, mapping the plurality of node networks to a plurality of systolic arrays, each systolic array comprising a plurality of cells and each non-data node in each node network being mapped to a cell in the plurality of cells, and mapping each cell in each systolic array to a computing core in the plurality of computing cores.
-
8.
公开(公告)号:US20240160825A1
公开(公告)日:2024-05-16
申请号:US18054460
申请日:2022-11-10
申请人: Cornami, Inc.
发明人: Muthiah Annamalai , Steven Knapp , Syed Ahmed , Paul L. Master , Martin Alan Franz, II , Tu Nghiem
IPC分类号: G06F30/392
CPC分类号: G06F30/392
摘要: A system and method to create a robust topology of a layout of cores for performing a function on an array of cores arranged in a grid is disclosed. A defective core file of location of defective cores in the array and an optimal ideal topology of a configuration layout of at least some of the cores is input. The location of at least one defective core of the array is determined. At least some of the cores in the array of cores are assigned to the optimal initial topography of cores in the array. It is determined whether at least one defective core is in the optimal initial topography. The functions of the cores in the row and the column of the at least one defective core are assigned to additional neighboring cores in the array of cores to create the robust topology.
-
公开(公告)号:US11693662B2
公开(公告)日:2023-07-04
申请号:US16743257
申请日:2020-01-15
申请人: Cornami Inc.
CPC分类号: G06F9/3812 , G06F7/4806 , G06F8/4436 , G06F17/142
摘要: Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.
-
10.
公开(公告)号:US20220179823A1
公开(公告)日:2022-06-09
申请号:US17681163
申请日:2022-02-25
申请人: Cornami Inc.
摘要: Systems and methods for reconfiguring a reduced instruction set computer processor architecture are disclosed. Exemplary implementations may: provide a primary processing core consisting of a RISC processor; provide a node wrapper associated with each of the plurality of secondary cores, the node wrapper comprising access memory associates with each secondary core, and a load/unload matrix associated with each secondary core; operate the architecture in a manner in which, for at least one core, data is read from and written to the at least cache memory in a control-centric mode; the secondary cores are selectively partitioned to operate in a streaming mode wherein data streams out of the corresponding secondary core into the main memory and other ones of the plurality of secondary cores.
-
-
-
-
-
-
-
-
-