SYSTEM AND METHODS FOR TAG-BASED SYNCHRONIZATION OF TASKS FOR MACHINE LEARNING OPERATIONS

    公开(公告)号:US20230205540A1

    公开(公告)日:2023-06-29

    申请号:US18115206

    申请日:2023-02-28

    摘要: A new approach for supporting tag-based synchronization among different tasks of a machine learning (ML) operation. When a first task tagged with a set tag indicating that one or more subsequent tasks need to be synchronized with it is received at an instruction streaming engine, the engine saves the set tag in a tag table and transmits instructions of the first task to a set of processing tiles for execution. When a second task having an instruction sync tag indicating that it needs to be synchronized with one or more prior tasks is received at the engine, the engine matches the instruction sync tag with the set tags in the tag table to identify prior tasks that the second task depends on. The engine holds instructions of the second task until these matching prior tasks have been completed and then releases the instructions to the processing tiles for execution.

    SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

    公开(公告)号:US20220188109A1

    公开(公告)日:2022-06-16

    申请号:US17686682

    申请日:2022-03-04

    摘要: A method includes receiving an input data at a floating point arithmetic operating unit, wherein the floating point operating unit is configured to perform a floating point arithmetic operation on the input data. The method includes determining whether the received input data is a qnan (quiet not-a-number) or whether the received input data is an snan (signaling not-a-number) prior to performing the floating point arithmetic operation. The method also includes converting a value of the received input data to a modified value prior to performing the floating point arithmetic operation if the received input data is either qnan or snan, wherein the converting eliminates special handling associated with the floating point arithmetic operation on the input data being either qnan or snan.

    Data transmission between memory and on chip memory of inference engine for machine learning via a single data gathering instruction

    公开(公告)号:US11210105B1

    公开(公告)日:2021-12-28

    申请号:US17087556

    申请日:2020-11-02

    发明人: Avinash Sodani

    摘要: A system to support data gathering for a machine learning (ML) operation comprises a memory unit configured to maintain data for the ML operation in a plurality of memory blocks each accessible via a memory address. The system further comprises an inference engine comprising a plurality of processing tiles each comprising one or more of an on-chip memory (OCM) configured to load and maintain data for local access by components in the processing tile. The system also comprises a core configured to program components of the processing tiles of the inference engine according to an instruction set architecture (ISA) and a data streaming engine configured to stream data between the memory unit and the OCMs of the processing tiles of the inference engine wherein data streaming engine is configured to perform a data gathering operation via a single data gathering instruction of the ISA at the same time.

    System and method for INT9 quantization

    公开(公告)号:US11977963B2

    公开(公告)日:2024-05-07

    申请号:US18075678

    申请日:2022-12-06

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.