-
公开(公告)号:US20250086022A1
公开(公告)日:2025-03-13
申请号:US18569295
申请日:2023-10-11
Applicant: ZHEJIANG LAB
Inventor: Yong LI , Laiping ZHAO , Jie LI , Wen CHENG , Guang CHEN , Lingfang ZENG
Abstract: A method for data processing is provided, and includes: obtaining each piece of to-be-processed data, determining whether a set amount of the to-be-processed data is capable to be processed under a current processing process by a data processing model, if not, obtaining data processing periods of the data processing model under multiple configuration combinations; for a data processing period of each of the multiple configuration combinations, determining an amount of data that is capable to be processed by the data processing model within the data processing period, as a target data amount; by taking the data processing model to be capable to process the set amount of the to-be-processed data as a target, according to the target data amount for a data processing period of each of the multiple configuration combinations, selecting a target configuration combination from the multiple configuration combinations.
-
2.
公开(公告)号:US20240354577A1
公开(公告)日:2024-10-24
申请号:US18374669
申请日:2023-09-29
Applicant: ZHEJIANG LAB
Inventor: Yong LI , Laiping ZHAO , Zezheng MAO , Wen CHENG , Guang CHEN , Lingfang ZENG
IPC: G06N3/084
CPC classification number: G06N3/084
Abstract: A method, a system, a device, and a storage medium for operation resource placement of deep learning are provided. The method includes: acquiring training operations to be placed and corresponding priorities; based on an order of the priorities, selecting a network structure for operation placement according to required resource amount of the training operations in sequence; the network structure including a server, a top of rack, a container group set denoted as Podset and a trunk layer switch; based on the selected network structure, taking a transmission amount of network data in a training process as an optimization target to perform minimization optimization, and obtaining a corresponding operation placement scheme.
-
公开(公告)号:US20240118897A1
公开(公告)日:2024-04-11
申请号:US18071978
申请日:2022-11-30
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Guang CHEN , Lingfang ZENG , Aimin PAN
IPC: G06F9/38
CPC classification number: G06F9/3838 , G06F9/3885
Abstract: Disclosed are an instruction execution method and apparatus for graph computation. The method includes the following steps: S1: sending operators of each node in a computational graph used for neural network computation to an operator interpreter; S2: building, by the operator interpreter, instructions in operation; S3: defining an instruction dependency relationship; S4: building an instruction dependency relationship graph; S5: building a topological order of parallel instructions; S6: scheduling the parallel instructions to hardware resources; S7: building shortest schedules for the parallel instructions: the shortest time required to execute the parallel instructions under the condition of limited hardware resources; and S8: releasing the completed instructions.
-
公开(公告)号:US20240104341A1
公开(公告)日:2024-03-28
申请号:US17992822
申请日:2022-11-22
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Guang CHEN , Lingfang ZENG
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: A memory optimization method includes: compiling a neural network into a computational graph for neural network computation on a computer; transforming the computational graph into a topological graph; constructing a life cycle relationship graph of tensor variables in the computational graph; and analyzing a life cycle relationship among tensor variables in a node of the computational graph; iteratively merging those tensor variables connected by lines of the second type and caching into a memory any tensor variable that goes beyond a number of idle registers and is not allocated to a register, until all tensor variables that go beyond the number of the idle registers and are not allocated to registers are cached into the memory; caching any node of the life cycle relationship graph with a degree smaller than a number of registers into a stack.
-
公开(公告)号:US20230334334A1
公开(公告)日:2023-10-19
申请号:US17833088
申请日:2022-06-06
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Hujun BAO , Guang CHEN
Abstract: The disclosure discloses a method of executing dynamic graph for neural network computation and the apparatus thereof. The method of executing dynamic graph includes the following steps: S1: constructing and distributing an operator and a tensor; S2: deducing an operator executing process by an operator interpreter; S3: constructing an instruction of a virtual machine at runtime by the operator interpreter; S4: sending the instruction to the virtual machine at runtime by the operator interpreter; S5: scheduling the instruction by the virtual machine; and S6: releasing an executed instruction by the virtual machine. According to the method of executing dynamic graph for neural network computation and the apparatus thereof provided by the disclosure, runtime is abstracted to be the virtual machine, and the virtual machine acquires a sub-graph of each step constructed by a user in real time through the interpreter and schedules, the virtual machines issues, and executes each sub-graph.
-
公开(公告)号:US20240385881A1
公开(公告)日:2024-11-21
申请号:US18456921
申请日:2023-08-28
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Guang CHEN , Feng LIN , Fei WU
IPC: G06F9/50
Abstract: Methods, systems, apparatus, and computer-readable media for distributed communication are provided. In one aspect, a system includes: a first Dynamic Communication Network Object (DCNO) configured on a first device and a second DCNO configured on a second device. The second DCNO is configured to, based on a notification message sent by a first worknode, allocate a target memory to store the target data in a memory of the second device, generate a read request based on the target data and the target memory, and transmit the read request to the first DCNO. The first DCNO is configured to: based on one or more properties of the target data, retrieve the target data from a memory of the first device, and write the target data to the target memory in the second device. A second worknode is configured to perform one or more data processing tasks based on the target data.
-
公开(公告)号:US20240104395A1
公开(公告)日:2024-03-28
申请号:US18072969
申请日:2022-12-01
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Guang CHEN
IPC: G06N3/10
CPC classification number: G06N3/10
Abstract: Disclosed are a memory optimization method and device oriented to neural network computing. The memory optimization method oriented to neural network computing includes the following steps: step S1: reconstructing a computation graph into a topological structure computation graph; step S2: constructing a life cycle interval about tensor variables; step S3: constructing a scanning line about the life cycle interval; step S4: allocating the tensor variables to idle registers; step S5: allocating to tensor variables exceeding the required number of registers; step S6: allocating registers allocated in the expired life cycle interval to tensor variables exceeding the required number of registers; and step S7: adding tensor variables transferred to a memory back to the life cycle interval in an activated state, and allocating idle registers for the tensor variables. According to the present disclosure, the memory of a data flow of a computation graph for neural network computing is optimized.
-
公开(公告)号:US20250086503A1
公开(公告)日:2025-03-13
申请号:US18580048
申请日:2023-10-12
Applicant: ZHEJIANG LAB
Inventor: Guang CHEN , Yong LI , Shiqiang ZHU
IPC: G06N20/00
Abstract: The present disclosure discloses a method and an apparatus for training a distributed model based on node fault perception, a storage medium, and an electronic device. During model training, a backup node can be assigned to each device node used during model training, such that in response to monitoring that a device node is faulty, the backup node corresponding to the faulty device node can take over the model training task, thereby ensuring the efficiency of the model training task.
-
公开(公告)号:US20240104016A1
公开(公告)日:2024-03-28
申请号:US18071958
申请日:2022-11-30
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Aimin PAN , Guang CHEN
IPC: G06F12/0802 , G06N3/063
CPC classification number: G06F12/0802 , G06N3/063
Abstract: The disclosure discloses an intermediate representation method for compiling computation graphs, including: step 1: compiling a neural network into a computation graph for neural network computation; step 2: constructing a node for each tensor variable in the computation graph; step 3: associating the node representing the tensor variable in the computation graph to a set of pointers to the tensor variable; step 4: analyzing constraint relationships between the tensor variables in the computation graph; step 5: iteratively constructing a topological graph of the intermediate representation based on the constraint relationships between the tensor variables in the computation graph; and step 6: analyzing the tensor variables with different aliases pointing to a same memory location based on the intermediate representation, and allocating a register for the tensor variables with different aliases. The method optimizes the compilation efficiency of the tensor variables pointing to the same memory location in the computation graph.
-
公开(公告)号:US20240127027A1
公开(公告)日:2024-04-18
申请号:US17992814
申请日:2022-11-22
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Shuibing HE , Guang CHEN
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: Disclosed are an optimization method and apparatus for compiling computation graph. The optimization method includes the following steps: step S1: converting a computation graph into an intermediate representation; step S2: analyzing a dependency relationship; step S3: constructing a work stack; step S4: performing initialization to achieve a nonactivated state; step S5: popping out stack top node elements, and updating an input node set in a current round of iteration; step S6: adding the stack top node elements that depend on step S5 to a stack top position in sequence until the work stack is empty; step S7: implementing an intermediate representation in a fixed node state using a bit vector; and step S8: allocating registers for effective tensor variables contained in nodes of the intermediate representation in the fixed node state.
-
-
-
-
-
-
-
-
-