-
1.
公开(公告)号:US11966857B2
公开(公告)日:2024-04-23
申请号:US17223921
申请日:2021-04-06
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
CPC分类号: G06N5/04 , G06F9/5027 , G06F17/16 , G06N20/00
摘要: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
-
公开(公告)号:US11687136B2
公开(公告)日:2023-06-27
申请号:US17726924
申请日:2022-04-22
申请人: Marvell Asia Pte Ltd
发明人: Avinash Sodani , Srinivas Sripada , Ramacharan Sundararaman , Chia-Hsin Chen , Nikhil Jayakumar
摘要: A power throttling engine includes a register configured to receive a power throttling signal. The power throttling engine further includes a decoder configured to generate a vector based on a value of the power throttling signal. The value of the power throttling signal is an amount of power throttling of a device. The power throttling engine further includes a clock gating logic configured to receive the vector and further configured to receive a clocking signal. The clock gating logic is configured to remove clock edges of the clocking signal based on the vector to generate a throttled clocking signal.
-
公开(公告)号:US20220188109A1
公开(公告)日:2022-06-16
申请号:US17686682
申请日:2022-03-04
申请人: Marvell Asia Pte Ltd
发明人: Chia-Hsin Chen , Avinash Sodani , Ulf Hanebutte , Rishan Tan , Soumya Gollamudi
摘要: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method includes determining whether the received input data is a qnan (quiet not-a-number) or whether the received input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. The method also includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the input data being either qnan or snan.
-
公开(公告)号:US20210318740A1
公开(公告)日:2021-10-14
申请号:US16947446
申请日:2020-07-31
IPC分类号: G06F1/3203 , G06F1/10 , G06F1/08
摘要: A system includes a first and a second group of cores in a multicore system. Each core of the first/second group is configured to process data. Each core within the first/second group is configured to enter into an idle state in response to being idle for a first/second period of time respectively. Every idle core in the first/second group is configured to transition out of the idle state and into an operational mode in response to receiving a signal having a first/second value respectively and further in response to having a pending operation to process.
-
公开(公告)号:US20230096994A1
公开(公告)日:2023-03-30
申请号:US18075678
申请日:2022-12-06
申请人: Marvell Asia Pte Ltd
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
IPC分类号: G06N20/00
摘要: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
-
6.
公开(公告)号:US11494676B2
公开(公告)日:2022-11-08
申请号:US17247826
申请日:2020-12-23
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
摘要: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.
-
公开(公告)号:US20220244767A1
公开(公告)日:2022-08-04
申请号:US17726924
申请日:2022-04-22
申请人: Marvell Asia Pte Ltd
发明人: Avinash Sodani , Srinivas Sripada , Ramacharan Sundararaman , Chia-Hsin Chen , Nikhil Jayakumar
摘要: A power throttling engine includes a register configured to receive a power throttling signal. The power throttling engine further includes a decoder configured to generate a vector based on a value of the power throttling signal. The value of the power throttling signal is an amount of power throttling of a device. The power throttling engine further includes a clock gating logic configured to receive the vector and further configured to receive a clocking signal. The clock gating logic is configured to remove clock edges of the clocking signal based on the vector to generate a throttled clocking signal.
-
公开(公告)号:US20220188108A1
公开(公告)日:2022-06-16
申请号:US17686676
申请日:2022-03-04
申请人: Marvell Asia Pte Ltd
发明人: Chia-Hsin Chen , Avinash Sodani , Ulf Hanebutte , Rishan Tan , Soumya Gollamudi
摘要: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data to generate an output result. The method also includes determining whether the output result is going to cause a floating point hardware exception responsive to the floating point arithmetic operation on the input data. The method further includes converting a value of the output result to a modified value responsive to the determining that the output result is going to cause the floating point hardware exception, wherein the modified value eliminates the floating point hardware exception responsive to the floating point arithmetic operation on the input data.
-
公开(公告)号:US20210342734A1
公开(公告)日:2021-11-04
申请号:US16862549
申请日:2020-04-29
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
IPC分类号: G06N20/00
摘要: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
-
10.
公开(公告)号:US11995569B2
公开(公告)日:2024-05-28
申请号:US17223921
申请日:2021-04-06
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
CPC分类号: G06N5/04 , G06F9/5027 , G06F17/16 , G06N20/00
摘要: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
-
-
-
-
-
-
-
-
-