-
公开(公告)号:US20250094806A1
公开(公告)日:2025-03-20
申请号:US18967167
申请日:2024-12-03
Inventor: Junyuan Shang , Yilong Chen , Zhenyu Zhang , Shuohuan Wang , Yu Sun , Hua Wu
IPC: G06N3/082 , G06N3/0475
Abstract: Provided is a large language model training method, an electronic device and a storage medium, relating to the field of artificial intelligence technologies, and in particular, to the fields of deep learning, natural language processing and large model. The method includes: performing dimension reduction parameter fusion on a two-dimensional parameter matrix on each channel in each network layer in a first large language model, respectively, to obtain a second large language model; performing layer reduction parameter fusion on network layers in the second large language model based on a three-dimensional parameter matrix of each network layer in the second large language model to obtain a third large language model; and training the third large language model to obtain a target large language model under the condition that the target loss function determined based on the first and third large language models meets a preset first function condition.