METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING

    公开(公告)号:US20240135147A1

    公开(公告)日:2024-04-25

    申请号:US18450839

    申请日:2023-08-15

    IPC分类号: G06N3/0455

    CPC分类号: G06N3/0455

    摘要: A device including processors configured to execute instructions and memories storing the instructions, which when executed by the processors configure the processors to perform an operation for training a transformer model having a plurality of encoders and a plurality of decoders by configuring the processors to identify the batches of training data into a plurality of micro-batches, select layer pairs for the plurality of micro-batches, assemble a processing order of the layer pairs, determining resource information to be allocated to the layer pairs, and allocate resources to the layer pairs based on the determined resource information to be allocated to the layer pairs, dependent con the processing order of the layer pairs.

    METHOD AND APPARATUS WITH TRANSFORMER MODEL TRAINING

    公开(公告)号:US20240232581A9

    公开(公告)日:2024-07-11

    申请号:US18450839

    申请日:2023-08-16

    IPC分类号: G06N3/0455

    CPC分类号: G06N3/0455

    摘要: A device including processors configured to execute instructions and memories storing the instructions, which when executed by the processors configure the processors to perform an operation for training a transformer model having a plurality of encoders and a plurality of decoders by configuring the processors to identify the batches of training data into a plurality of micro-batches, select layer pairs for the plurality of micro-batches, assemble a processing order of the layer pairs, determining resource information to be allocated to the layer pairs, and allocate resources to the layer pairs based on the determined resource information to be allocated to the layer pairs, dependent con the processing order of the layer pairs.

    DEVICE AND METHOD WITH BATCH NORMALIZATION
    3.
    发明公开

    公开(公告)号:US20240184630A1

    公开(公告)日:2024-06-06

    申请号:US18526603

    申请日:2023-12-01

    IPC分类号: G06F9/50 G06F5/01 G06F15/80

    摘要: A device and method with batch normalization are provided. An accelerator includes: core modules, each core module including a respective plurality of cores configured to perform a first convolution operation using feature map data and a weight; local reduction operation modules adjacent to the respective core modules, each including a respective plurality of local reduction operators configured to perform a first local operation that obtains first local statistical values of the corresponding core module; a global reduction operation module configured to perform a first global operation that generates first global statistical values of the core module based on the first local statistical values of the core modules; and a normalization operation module configured to perform a first normalization operation on the feature map data based on the first global statistical values.