EFFICIENT INTER-CHIP INTERCONNECT TOPOLOGY FOR DISTRIBUTED PARALLEL DEEP LEARNING

    公开(公告)号:US20210240532A1

    公开(公告)日:2021-08-05

    申请号:US16777683

    申请日:2020-01-30

    Inventor: Liang HAN Yang JIAO

    Abstract: The present disclosure provides a system comprising: a first group of computing nodes and a second group of computing nodes, wherein the first and second groups are neighboring devices and each of the first and second groups comprising: a set of computing nodes A-D, and a set of intra-group interconnects, wherein the set of intra-group interconnects communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C; and a set of inter-group interconnects, wherein the set of inter-group interconnects communicatively couple computing node A of the first group with computing node A of the second group, computing node B of the first group with computing node B of the second group, computing node C of the first group with computing node C of the second group, and computing node D of the first group with computing node D of the second group.

    PROGRAMMABLE MULTIPLY-ADD ARRAY HARDWARE
    2.
    发明申请

    公开(公告)号:US20190196788A1

    公开(公告)日:2019-06-27

    申请号:US16054783

    申请日:2018-08-03

    Abstract: An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.

    EFFICIENT INTER-CHIP INTERCONNECT TOPOLOGY FOR DISTRIBUTED PARALLEL DEEP LEARNING

    公开(公告)号:US20230153164A1

    公开(公告)日:2023-05-18

    申请号:US18151384

    申请日:2023-01-06

    Inventor: Liang HAN Yang JIAO

    CPC classification number: G06F9/505 G06F9/5044 G06N3/08 G06N3/063

    Abstract: The present disclosure provides a system comprising: a first group of computing nodes and a second group of computing nodes, wherein the first and second groups are neighboring devices and each of the first and second groups comprising: a set of computing nodes A-D, and a set of intra-group interconnects, wherein the set of intra-group interconnects communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C; and a set of inter-group interconnects, wherein the set of inter-group interconnects communicatively couple computing node A of the first group with computing node A of the second group, computing node B of the first group with computing node B of the second group, computing node C of the first group with computing node C of the second group, and computing node D of the first group with computing node D of the second group.

    MULTI-SIZE CONVOLUTIONAL LAYER BACKGROUND

    公开(公告)号:US20210357730A1

    公开(公告)日:2021-11-18

    申请号:US16872979

    申请日:2020-05-12

    Abstract: Systems and methods for improved convolutional layers for neural networks are disclosed. An improved convolutional layer can obtain at least two input feature maps of differing channel sizes. The improved convolutional layer can generate an output feature map for each one of the at least two input feature maps. Each input feature map can be applied to a convolutional sub-layer to generate an intermediate feature map. For each intermediate feature map, versions of the remaining intermediate feature maps can be resized to match the channel size of the intermediate feature map. For each intermediate feature map, an output feature map can be generated by combining the intermediate feature map and the corresponding resized versions of the remaining intermediate feature maps.

    HETEROGENEOUS DEEP LEARNING ACCELERATOR

    公开(公告)号:US20210125042A1

    公开(公告)日:2021-04-29

    申请号:US16664668

    申请日:2019-10-25

    Inventor: Liang HAN

    Abstract: Systems and methods for heterogenous hardware acceleration are disclosed. The systems and methods can include a neural network processing unit comprising compute tiles. Each of a first set of the compute tiles can include a first tensor array configured to support operations in a first number format. Each of a second set of the compute tiles can include a second tensor array configured to support operations in a second number format, the second number format supporting a greater range or a greater precision than the first number format, and a de-quantizer configured to convert data in the first number format to data in the second number format. The systems and methods can include neural network processing units, multi-chip hardware accelerators and distributed hardware accelerators including low-precision components for performing interference tasks and high-precision components for performing training tasks. Transfer learning tasks can be performed using low-precision components and high-precision components.

    NEURAL PROCESSING UNIT SYNCHRONIZATION SYSTEMS AND METHODS

    公开(公告)号:US20230259486A1

    公开(公告)日:2023-08-17

    申请号:US18006845

    申请日:2020-11-02

    CPC classification number: G06F15/17325 G06F15/167

    Abstract: Systems and methods for exchanging synchronization information between processing units using a synchronization network are disclosed. The disclosed systems and methods include a device including a host and associated neural processing units. Each of the neural processing units can include a command communication module and a synchronization communication module. The command communication module can include circuitry for communicating with the host device over a host network. The synchronization communication module can include circuitry enabling communication between neural processing units over a synchronization network. The neural processing units can be configured to each obtain a synchronized update for a machine learning model. This synchronized update can be obtained at least in part by exchanging synchronization information using the synchronization network. The neural processing units can each maintain a version of the machine learning model and can synchronize it using the synchronized update.

    MULTI-SIZE CONVOLUTIONAL LAYER
    7.
    发明申请

    公开(公告)号:US20210142144A1

    公开(公告)日:2021-05-13

    申请号:US16677462

    申请日:2019-11-07

    Inventor: Liang HAN

    Abstract: Systems and methods for improved convolutional layers for neural networks are disclosed. The improved convolutional layers can obtain an input feature map comprising groups of channels. Each group of channels can include one or more channels having a predetermined size. The predetermined sizes can differ between the groups. The convolutional layer can generate, for each one of the groups of channels, an output channel. Generation of the output channel can include resizing the channels in the remaining groups of channels to match the predetermined size of the each one of the groups of channels. Generation can further include combining the channels in the each one of the groups with the resized channels and applying the combined channels to a convolutional sub-layer to generate the output channel.

    PROGRAMMABLE MULTIPLY-ADD ARRAY HARDWARE
    8.
    发明申请

    公开(公告)号:US20200293283A1

    公开(公告)日:2020-09-17

    申请号:US16886613

    申请日:2020-05-28

    Abstract: An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.

    METHOD AND SYSTEM FOR PERFORMING MACHINE LEARNING

    公开(公告)号:US20190332940A1

    公开(公告)日:2019-10-31

    申请号:US16396563

    申请日:2019-04-26

    Inventor: Liang HAN

    Abstract: Embodiments of the disclosure provide methods and systems for performing machine learning. The method can include: receiving training data; training a machine learning model based on the training data, wherein the machine learning model includes multiple layers each having one or more nodes having one or more connections with a node from another layer of the machine learning model; evaluating weights associated with the connections of the machine learning model, wherein each connection has a corresponding weight; removing, from the machine learning model, one or more connections having a weight that does not satisfy a threshold condition; and after the connections have been removed, updating the machine learning model.

    UNIFIED MEMORY ORGANIZATION FOR NEURAL NETWORK PROCESSORS

    公开(公告)号:US20190196970A1

    公开(公告)日:2019-06-27

    申请号:US15984255

    申请日:2018-05-18

    CPC classification number: G06F12/0813 G06F12/084 G06F12/0842 G06N3/063

    Abstract: The present disclosure relates to a unified memory apparatus having a unified storage medium and one or more processing units. The unified memory apparatus can include a first storage module having a first plurality of storage cells, and a second storage module having a second plurality of storage cells, each of the first and second plurality of storage cells configured to store data and to be identified by a unique cell identifier. The one or more processing units are in communication with the unified storage medium and the processing units are configured to receive a first input data from one of the first plurality of storage cells, receive a second input data from one of the second plurality of storage cells, and generate an output data based on the first and second input data.

Patent Agency Ranking