METHOD TO AVOID MEMORY BANK CONFLICTS AND PIPELINE CONFLICTS IN TENSOR MEMORY LAYOUT

    公开(公告)号:US20230021472A1

    公开(公告)日:2023-01-26

    申请号:US17954695

    申请日:2022-09-28

    Abstract: A method for optimizing a layout of a tensor memory defines at least one hard constraint for allocating a plurality of input/output (I/O) vectors for reading and writing data for a task in the tensor memory. The at least one hard constraint is applied to determine one or more potential conflicts between the plurality of I/O vectors. One or more soft constraints aimed at mitigating the one or more potential conflicts between the I/O vectors may also be generated. The at least one hard constraint is applied in a maximum satisfiability (MaxSAT) solver. The one or more soft constraints may also be applied in the MaxSAT solver. The MaxSAT solver determines locations of the data in the tensor memory. The starting addresses of the input data to be read and of output data to be written by each of the I/O vectors are updated in the tensor memory.

    NODE FUSION METHOD FOR COMPUTATIONAL GRAPH AND DEVICE

    公开(公告)号:US20230334292A1

    公开(公告)日:2023-10-19

    申请号:US18214101

    申请日:2023-06-26

    CPC classification number: G06N3/045 G06N3/10

    Abstract: Embodiments of this application disclose a node fusion method for a computational graph and a device. The method includes: converting a neural network into a computational graph; extracting one or more parallelizable branch groups from the computational graph based on a dependency relationship between nodes in the computational graph, where the dependency relationship indicates at least one of the following relationships: the parallelizable branch group has a common parent node, the parallelizable branch group has a common child node, the parallelizable branch group has no parent node, and the parallelizable branch group has no child node; and finally, fusing a plurality of nodes in any parallelizable branch group that respectively belong to different sub-branches to obtain a new computational graph.

    DRIVER MANAGEMENT METHOD AND HOST
    3.
    发明申请

    公开(公告)号:US20190310874A1

    公开(公告)日:2019-10-10

    申请号:US16431948

    申请日:2019-06-05

    Inventor: Xiong GAO

    Abstract: Embodiments of the present disclosure disclose a driver management method and a host. The method includes: allocating a first hardware device to a target virtual machine on the host; obtaining, a target driver package of the first hardware device from N pre-stored driver packages, where the N driver packages are driver packages of N types of hardware devices, a type of the first hardware device is one of the N types of hardware devices, and N is a positive integer greater than or equal to 1; adding the target driver package into the target virtual machine to enable the target virtual machine to read the target driver package; and installing the target driver package, where a driver obtained by installing the target driver package is used by the target virtual machine to invoke the first hardware device in a hardware pass-through manner.

    SYNCHRONIZATION INSTRUCTION INSERTION METHOD AND APPARATUS

    公开(公告)号:US20220113971A1

    公开(公告)日:2022-04-14

    申请号:US17558076

    申请日:2021-12-21

    Inventor: Xiong GAO Kun ZHANG

    Abstract: This application discloses example synchronization instruction insertion methods and example apparatuses. One example method includes obtaining a first program block comprising one or more statements, where each of the one or more statements includes one or more function instructions. A first function instruction and a second function instruction between which data dependency exists in the first program block can then be determined. A synchronization instruction pair between a first statement including the first function instruction and a second statement including the second function instruction can then be inserted.

    DATA FLOW PROCESSING METHOD AND APPARATUS, AND SYSTEM

    公开(公告)号:US20180367460A1

    公开(公告)日:2018-12-20

    申请号:US16054283

    申请日:2018-08-03

    Abstract: Embodiments of the present disclosure provide a data flow processing method and apparatus, and a system. A processing process performed on a packet is divided into multiple processing actions. Some processing actions are spread only when traffic of a current data flow meets a preset condition. Therefore, multiple processor cores may process a packet in a pipeline manner, so as to improve processing efficiency. When a bandwidth fluctuation amplitude of a data flow is relatively large and a peek bandwidth of the data flow is relatively large, compared with a static pipeline manner, the method provided in the embodiments avoids a waste of processing resources to some extent when traffic is relatively low, and can also better support data flow processing when traffic is relatively high.

Patent Agency Ranking