-
1.
公开(公告)号:US11966857B2
公开(公告)日:2024-04-23
申请号:US17223921
申请日:2021-04-06
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
CPC分类号: G06N5/04 , G06F9/5027 , G06F17/16 , G06N20/00
摘要: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
-
2.
公开(公告)号:US20230205540A1
公开(公告)日:2023-06-29
申请号:US18115206
申请日:2023-02-28
申请人: Marvell Asia Pte Ltd
发明人: Avinash Sodani , Gopal Nalamalapu
CPC分类号: G06F9/3851 , G06F9/52 , G06F9/3836 , G06F9/4881 , G06F15/80 , G06F9/38 , G06F15/76
摘要: A new approach for supporting tag-based synchronization among different tasks of a machine learning (ML) operation. When a first task tagged with a set tag indicating that one or more subsequent tasks need to be synchronized with it is received at an instruction streaming engine, the engine saves the set tag in a tag table and transmits instructions of the first task to a set of processing tiles for execution. When a second task having an instruction sync tag indicating that it needs to be synchronized with one or more prior tasks is received at the engine, the engine matches the instruction sync tag with the set tags in the tag table to identify prior tasks that the second task depends on. The engine holds instructions of the second task until these matching prior tasks have been completed and then releases the instructions to the processing tiles for execution.
-
公开(公告)号:US11687136B2
公开(公告)日:2023-06-27
申请号:US17726924
申请日:2022-04-22
申请人: Marvell Asia Pte Ltd
发明人: Avinash Sodani , Srinivas Sripada , Ramacharan Sundararaman , Chia-Hsin Chen , Nikhil Jayakumar
摘要: A power throttling engine includes a register configured to receive a power throttling signal. The power throttling engine further includes a decoder configured to generate a vector based on a value of the power throttling signal. The value of the power throttling signal is an amount of power throttling of a device. The power throttling engine further includes a clock gating logic configured to receive the vector and further configured to receive a clocking signal. The clock gating logic is configured to remove clock edges of the clocking signal based on the vector to generate a throttled clocking signal.
-
公开(公告)号:US11507170B1
公开(公告)日:2022-11-22
申请号:US17086264
申请日:2020-10-30
发明人: Atul Bhattarai , Srinivas Sripada , Avinash Sodani , Michael Dudek , Darren Walworth , Roshan Fernando , James Irvine , Mani Gopal
IPC分类号: G06F1/3206 , G06F11/30
摘要: A system includes a multicore chip configured to perform machine learning (ML) operations. The system also includes a power monitoring module configured to measure power consumption of the multicore chip on a main power rail of the multicore chip. The power monitoring module is further configured to assert a signal in response to the measured power consumption exceeding a first threshold. The power monitoring module is further configured to transmit the asserted signal to a power throttling module to initiate a power throttling for the multicore chip.
-
公开(公告)号:US20220188109A1
公开(公告)日:2022-06-16
申请号:US17686682
申请日:2022-03-04
申请人: Marvell Asia Pte Ltd
发明人: Chia-Hsin Chen , Avinash Sodani , Ulf Hanebutte , Rishan Tan , Soumya Gollamudi
摘要: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method includes determining whether the received input data is a qnan (quiet not-a-number) or whether the received input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. The method also includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the input data being either qnan or snan.
-
公开(公告)号:US11210105B1
公开(公告)日:2021-12-28
申请号:US17087556
申请日:2020-11-02
发明人: Avinash Sodani
摘要: A system to support data gathering for a machine learning (ML) operation comprises a memory unit configured to maintain data for the ML operation in a plurality of memory blocks each accessible via a memory address. The system further comprises an inference engine comprising a plurality of processing tiles each comprising one or more of an on-chip memory (OCM) configured to load and maintain data for local access by components in the processing tile. The system also comprises a core configured to program components of the processing tiles of the inference engine according to an instruction set architecture (ISA) and a data streaming engine configured to stream data between the memory unit and the OCMs of the processing tiles of the inference engine wherein data streaming engine is configured to perform a data gathering operation via a single data gathering instruction of the ISA at the same time.
-
公开(公告)号:US20210318740A1
公开(公告)日:2021-10-14
申请号:US16947446
申请日:2020-07-31
IPC分类号: G06F1/3203 , G06F1/10 , G06F1/08
摘要: A system includes a first and a second group of cores in a multicore system. Each core of the first/second group is configured to process data. Each core within the first/second group is configured to enter into an idle state in response to being idle for a first/second period of time respectively. Every idle core in the first/second group is configured to transition out of the idle state and into an operational mode in response to receiving a signal having a first/second value respectively and further in response to having a pending operation to process.
-
公开(公告)号:US11995463B2
公开(公告)日:2024-05-28
申请号:US17237752
申请日:2021-04-22
IPC分类号: G06F9/48 , G06F3/06 , G06F9/52 , G06N20/00 , G06F9/30 , G06F9/38 , G06F15/78 , G06F15/80 , G06F17/16 , G06N5/04
CPC分类号: G06F9/4818 , G06F3/0604 , G06F3/0659 , G06F3/0673 , G06F9/4881 , G06F9/52 , G06N20/00 , G06F9/30018 , G06F9/30087 , G06F9/3869 , G06F9/3871 , G06F9/522 , G06F15/7807 , G06F15/7846 , G06F15/8053 , G06F17/16 , G06N5/04
摘要: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
-
公开(公告)号:US11977963B2
公开(公告)日:2024-05-07
申请号:US18075678
申请日:2022-12-06
申请人: Marvell Asia Pte Ltd
发明人: Avinash Sodani , Ulf Hanebutte , Chia-Hsin Chen
IPC分类号: G06N20/00
CPC分类号: G06N20/00
摘要: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
-
公开(公告)号:US11934863B2
公开(公告)日:2024-03-19
申请号:US17237752
申请日:2021-04-22
IPC分类号: G06F9/48 , G06F3/06 , G06F9/52 , G06N20/00 , G06F9/30 , G06F9/38 , G06F15/78 , G06F15/80 , G06F17/16 , G06N5/04
CPC分类号: G06F9/4818 , G06F3/0604 , G06F3/0659 , G06F3/0673 , G06F9/4881 , G06F9/52 , G06N20/00 , G06F9/30018 , G06F9/30087 , G06F9/3869 , G06F9/3871 , G06F9/522 , G06F15/7807 , G06F15/7846 , G06F15/8053 , G06F17/16 , G06N5/04
摘要: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
-
-
-
-
-
-
-
-
-