Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Guoxia WANG"

1.

发明申请
MODEL TRAINING METHOD, MODEL REASONING METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20250094802A1

公开(公告)日：2025-03-20

申请号：US18965684

申请日：2024-12-02

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Junyuan SHANG , Guoxia WANG , Yinqi YANG , Shuohuan WANG , Yu SUN

IPC: G06N3/08 , G06F40/284

Abstract: Provided is a model training method, a model reasoning method, an electronic device, and a storage medium, relating to the field of data processing, and especially to the technical fields of artificial intelligence, big data, deep learning and large models. The model training method includes: folding an initial token sequence for training a model based on a folding feature value for folding a token sequence to obtain at least a first token sequence subjected to the folding, wherein the initial token sequence represents a token sequence composed of T1 tokens, and the first token sequence has a sequence length less than that of the initial token sequence; and inputting at least the first token sequence into a preset model to train the preset model so as to obtain a target model.

2.

发明申请
TRAINING METHOD FOR A DEEP LEARNING MODEL 有权

公开(公告)号：US20250061305A1

公开(公告)日：2025-02-20

申请号：US18936686

申请日：2024-11-04

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuohuan WANG , Junyuan SHANG , Yinqi YANG , Guoxia WANG , Linhao ZHANG , Yu SUN , Hua WU , Haifeng WANG

IPC: G06N3/043 , G06N3/045 , G06N3/0985

Abstract: A training method, an inference method, a device, an apparatus, and a medium for a deep learning model are provided. A first model includes a plurality of first parameters, a second model comprises a plurality of second parameters, which is initialized to parameter values of a plurality of target parameters selected from the plurality of first parameters. The training method includes: determining a target loss for both the first model and the second model; adjusting parameter values, including: in response to determining that the target loss indicates that the parameter values of at least part of the target parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding second parameters; and in response to determining that the target loss indicates that the parameter values of at least part of the second parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding target parameters.

3.

发明申请
METHOD OF EXECUTING TASK FOR LARGE LANGUAGE MODEL, DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20240378077A1

公开(公告)日：2024-11-14

申请号：US18782617

申请日：2024-07-24

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Guoxia WANG , Jinle ZENG , Xiyuan XIAO , Jiabin YANG , Dianhai YU , Haifeng WANG

IPC: G06F9/48 , G06F40/40

Abstract: A method of executing a task for a large language model, a device, and a storage medium are provided, which relate to a field of artificial intelligence technology, and in particular to fields of deep learning, large language model, natural language processing and computer vision technologies. The method includes: determining, by using a determination unit, a target attention task from a plurality of attention tasks to be processed, based on a sparse representation corresponding to a feature to be processed, where the target attention task is a task corresponding to a non-fully masked region of the feature, the sparse representation represents a mask position of the feature, and the mask position represents mask endpoint positions in at least two non-intersecting intervals in a mask matrix corresponding to the feature; and executing the target attention task by using a computing unit, so as to obtain an attention feature.

4.

发明公开
CONTENT INITIALIZATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20240275848A1

公开(公告)日：2024-08-15

申请号：US18020618

申请日：2022-08-01

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Guoxia WANG , Long LI , Zhihua WU

IPC: H04L67/1097 , G06F7/58

CPC classification number: H04L67/1097 , G06F7/582 , G06F7/588

Abstract: The present disclosure provides a content initialization method and apparatus, an electronic device and a storage medium, which relates to a field of computer technology, in particular to fields of deep learning and distributed computing. The content initialization method is applied to any one of a plurality of devices included in a distributed system. A specific implementation scheme of the content initialization method is: determining, according to a size information of a resource space for the distributed system and an identification information of the any one of the plurality of devices, a space information of a first sub-space for the any one of the plurality of devices in the resource space, wherein the space information includes a position information of the first sub-space for the resource space; and determining an initialization content for the first sub-space according to a random seed and the position information.

Patent Agency Ranking