-
公开(公告)号:US20240378259A1
公开(公告)日:2024-11-14
申请号:US18144818
申请日:2023-05-08
Applicant: SambaNova Systems, Inc.
Inventor: Mark William Gottscho , Ram SIVARAMAKRISHNAN , David Brian JACKSON , Ruddhi CHAPHEKAR , Tuowen Zhao , Lei Xia
Abstract: A convolution calculation engine to perform a convolution operation includes a convolution address compute unit. The convolution address compute unit includes an outer output base location register to provide an outer output base location for the convolution operation and an outer input base location register to provide an outer input base location for the convolution operation. It also includes a kernel element counter that starts to count from an initial kernel count value to a maximum kernel count value in response to a change in the outer output base location and a kernel offset generator to generate a kernel offset based on an output of the kernel element counter. In addition, the convolution address compute unit includes inner location logic to calculate an output location based on the outer output base location and an input location based on the outer input base location and output of the kernel element counter.
-
2.
公开(公告)号:US20240192935A1
公开(公告)日:2024-06-13
申请号:US18583845
申请日:2024-02-21
Applicant: SambaNova Systems, Inc.
Inventor: Raghu PRABHAKAR , David Brian JACKSON , Scott BURSON
CPC classification number: G06F8/441 , G06F9/3001 , G06F9/44505 , G06F15/80
Abstract: A compiler generates a configuration file to configure a fracturable data path in a coarse-grained reconfigurable processor. The configuration file, when loaded into the reconfigurable processor enables a fracturable data path in a configurable unit of the reconfigurable processor to produce multiple independent address sequences by analyzing two address calculations to determine the number of pipeline stages for each calculation. The configuration file includes first and second configuration data for distinct sets of computational stages within the pipelined computation stages, allowing the processor to generate a first address sequence using N pipeline stages and a second address sequence using M pipeline stages, where N and M are positive integers.
-
公开(公告)号:US20230367844A1
公开(公告)日:2023-11-16
申请号:US18225339
申请日:2023-07-24
Applicant: SambaNova Systems, Inc.
Inventor: Pramod NATARAJA , Raghu PRABHAKAR , David Brian JACKSON , Ram SIVARAMAKRISHNAN
IPC: G06F17/16
CPC classification number: G06F17/16
Abstract: A computing method comprises generating an integrated matrix having (K+P) number of columns, columns 1 through K of the integrated matrix comprising columns 1 through K of a multiplicand matrix and columns (K+1) though P of the integrated matrix comprising addend columns. The method computes K number of products of elements of a row of the integrated matrix multiplied by elements of a column of a second multiplicand matrix; computes a (K+1) product comprising an element of an addend column multiplied by a constant; and, computes a sum of the K number of products added to the (K+1) product. The sum is equivalent to a sum of products of a column of the M×K matrix multiplied by a row of the K×N matrix added to the an element of an addend column of the integrated matrix. A computing system and a computer program product can implement the method.
-
公开(公告)号:US20230229623A1
公开(公告)日:2023-07-20
申请号:US18099218
申请日:2023-01-19
Applicant: SambaNova Systems, Inc.
Inventor: Raghu PRABHAKAR , David Brian JACKSON
CPC classification number: G06F15/80 , G06F9/3001
Abstract: A coarse-grained reconfigurable (CGR) processor includes a configurable unit comprising a fracturable data path with a plurality of sub-paths. The fracturable data path includes multiple stages that each include an arithmetic logic unit (ALU), selection logic to select two or more inputs for the ALU, and sub-path pipeline registers. The fracturable data path also includes a first output configurable to provide first data selected from any one of the sub-path pipeline registers and a second output configurable to provide second data selected from any one of the sub-path pipeline registers. The configurable unit includes a configuration store to store configuration data to provide a two or more immediate data fields for each stage of the fracturable data path and configuration information for the ALUs, the selection logic, and to select the first data and the second data for the first output and the second output.
-
公开(公告)号:US20230229407A1
公开(公告)日:2023-07-20
申请号:US18099214
申请日:2023-01-19
Applicant: SambaNova Systems, Inc.
Inventor: Raghu PRABHAKAR , David Brian JACKSON , Scott BURSON
CPC classification number: G06F8/441 , G06F9/44505
Abstract: A complier produces a configuration file to configure a fracturable data path of a configurable unit in a coarse-grained reconfigurable processor to concurrently generate different address sequences generated using different address associated with different operations. The fracturable data path includes multiple computation stages respectively including a pipeline register. The compiler analyzes a first address calculation and a second address calculation and assigns a first set of stages to the first operation to generate the first address sequence and a second set of stages to the second operation to generate the second address sequence using the second set of stages, based on the analysis. A configuration file for the configurable unit is generated by the compiler that assigns the first set of stages to the first operation and the second set of stages to the second operation and includes two or more immediate values for each computation stage.
-
公开(公告)号:US20220027308A1
公开(公告)日:2022-01-27
申请号:US17492403
申请日:2021-10-01
Applicant: SambaNova Systems, Inc.
Inventor: Raghu PRABHAKAR , Manish K. SHAH , Ram SIVARAMAKRISHNAN , Pramod NATARAJA , David Brian JACKSON , Gregory Frederick GROHOSKI
Abstract: A processing system comprises a control bus and a plurality of logic units. The control bus is configurable by configuration data to form signal routes in a control barrier network coupled to processing units in an array of processing units. The plurality of logic units has inputs and outputs connected to the control bus and to the array of processing units. A logic unit in the plurality of logic units is operatively coupled to a processing unit in the array of processing units and is configurable by the configuration data to consume source tokens and a status signal from the processing unit on the inputs and to produce barrier tokens and an enable signal on the outputs based on the source tokens and the status signal on the inputs.
-
公开(公告)号:US20240378147A1
公开(公告)日:2024-11-14
申请号:US18144819
申请日:2023-05-08
Applicant: SambaNova Systems, Inc.
Inventor: Mark William Gottscho , Ram SIVARAMAKRISHNAN , David Brian JACKSON , Ruddhi CHAPHEKAR , Tuowen Zhao , Lei Xia
Abstract: A convolution calculation engine includes a kernel element counter for a convolution operation between a kernel and an input tensor. The kernel element counter wraps back to an initial kernel count value after reaching a maximum kernel count value. The convolution calculation engine also includes an offset look-up table (LUT) that provides a relative input offset into the input tensor based on an output of the kernel element counter and input location calculation logic that provides an input location within an input tensor for the convolution operation based on the relative input offset provided by the offset LUT.
-
公开(公告)号:US20230367845A1
公开(公告)日:2023-11-16
申请号:US18225365
申请日:2023-07-24
Applicant: SambaNova Systems, Inc.
Inventor: Pramod NATARAJA , Raghu PRABHAKAR , David Brian JACKSON , Ram SIVARAMAKRISHNAN
IPC: G06F17/16
CPC classification number: G06F17/16
Abstract: A method comprises executing (K+P) number of transposition cycles to generate a transpose-extended matrix having N rows and (K+P) columns, in which columns 1 to K comprise a transposition of a first matrix having K rows and N columns, and columns (K+1) to (K+P) comprise constants or elements of an N×1 matrix. The method includes computing a sum-product of a row of a second matrix, having M rows and N columns, multiplied by a column among columns 1 to K of the transpose-extended matrix; and, computing a second sum-product of the row of the second matrix multiplied by a column among columns (K+1) to (K+P) of the transpose-extended matrix. The sum-products can comprise gradients of input matrices. A transpose processing unit can execute the transposition cycles to read K rows of the first matrix and insert P number of constant or N×1 columns to generate the transpose-extended matrix.
-
公开(公告)号:US20230195686A1
公开(公告)日:2023-06-22
申请号:US18109817
申请日:2023-02-14
Applicant: SambaNova Systems, Inc.
Inventor: Raghu PRABHAKAR , Manish K. SHAH , Ram SIVARAMAKRISHNAN , Pramod NATARAJA , David Brian JACKSON , Gregory Frederick GROHOSKI
IPC: G06F15/78 , G06F13/20 , G06F15/80 , G06F9/52 , G06F15/173
CPC classification number: G06F15/7867 , G06F13/20 , G06F15/80 , G06F9/522 , G06F15/17325 , G06F2213/40
Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.
-
公开(公告)号:US20240256631A1
公开(公告)日:2024-08-01
申请号:US18102658
申请日:2023-01-27
Applicant: SambaNova Systems, Inc.
Inventor: Pramod NATARAJA , Raghu PRABHAKAR , David Brian JACKSON , Ram SIVARAMAKRISHNAN
IPC: G06F17/16
CPC classification number: G06F17/16
Abstract: A computing method comprises combining an M×K multiplicand matrix and P number of addend vectors to generate an M×(K+P) integrated matrix. The addend vectors can comprise a vector of constants and/or a column of an addend matrix. The method further comprises generating a row-extended matrix comprising a K×N multiplicand matrix and P rows of a constant vector. The method computes (K+P) products of a row of the integrated matrix multiplied by a column of the row-extended matrix and computing an integrated sum of the products. A multiply-accumulate computation can compute the integrated sum and is equivalent to a sum of K number of products of a column of the M×K matrix multiplied by a row of the K×N multiplicand matrix and added to the P number of addend vectors. A computing system can implement the method and can include a matrix computation unit.
-
-
-
-
-
-
-
-
-