TRAINING METHOD FOR A DEEP LEARNING MODEL

    公开(公告)号:US20250061305A1

    公开(公告)日:2025-02-20

    申请号:US18936686

    申请日:2024-11-04

    Abstract: A training method, an inference method, a device, an apparatus, and a medium for a deep learning model are provided. A first model includes a plurality of first parameters, a second model comprises a plurality of second parameters, which is initialized to parameter values of a plurality of target parameters selected from the plurality of first parameters. The training method includes: determining a target loss for both the first model and the second model; adjusting parameter values, including: in response to determining that the target loss indicates that the parameter values of at least part of the target parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding second parameters; and in response to determining that the target loss indicates that the parameter values of at least part of the second parameters need to be adjusted, synchronously adjusting the parameter values of the corresponding target parameters.

    MODEL TRAINING
    2.
    发明申请

    公开(公告)号:US20220198153A1

    公开(公告)日:2022-06-23

    申请号:US17694034

    申请日:2022-03-14

    Abstract: A model training method, a model training platform, an electronic device and a storage medium are provided, which can be used in the field of artificial intelligence, particularly the fields of natural language processing and deep learning. The model training method includes: receiving an input; determining, based on the input, a user-oriented prefabricated function; determining, based on the input, a model training function; determining, based on the input, a pre-trained model; determining, based on the input, a network structure associated with the pre-trained model so as to support use of the pre-trained model; training, based on the input, the model by using the prefabricated function, the model training function, and the pre-trained model; and providing an output associated with a trained model.

    MODEL TRAINING METHOD, MODEL REASONING METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20250094802A1

    公开(公告)日:2025-03-20

    申请号:US18965684

    申请日:2024-12-02

    Abstract: Provided is a model training method, a model reasoning method, an electronic device, and a storage medium, relating to the field of data processing, and especially to the technical fields of artificial intelligence, big data, deep learning and large models. The model training method includes: folding an initial token sequence for training a model based on a folding feature value for folding a token sequence to obtain at least a first token sequence subjected to the folding, wherein the initial token sequence represents a token sequence composed of T1 tokens, and the first token sequence has a sequence length less than that of the initial token sequence; and inputting at least the first token sequence into a preset model to train the preset model so as to obtain a target model.

    METHOD AND APPARATUS FOR TRAINING MODEL, AND METHOD AND APPARATUS FOR PREDICTING TEXT

    公开(公告)号:US20220129768A1

    公开(公告)日:2022-04-28

    申请号:US17646851

    申请日:2022-01-03

    Abstract: The present disclosure provides a method and apparatus for training a model. The method can include: acquiring at least one paragraph text, each paragraph text comprising a plurality of fine-grained samples; processing a fine-grained sample in the each paragraph text to obtain a coarse-grained sample; annotating the coarse-grained sample in the each paragraph text and obscuring one coarse-grained sample using a mask of one fine-grained sample to obtain a training sample set, wherein the training sample set comprises a plurality of annotated texts, and each annotated text comprises at least one of a fine-grained sample or an annotated coarse-grained sample; and training a fine-grained model using the training sample set to obtain a trained fine-grained model, the fine-grained model being used to learn content of a previous fine grain size and predict content of an adjacent coarse grain size.

    DIALOGUE MODEL TRAINING METHOD
    7.
    发明申请

    公开(公告)号:US20240412002A1

    公开(公告)日:2024-12-12

    申请号:US18747641

    申请日:2024-06-19

    Abstract: A method is provided. The method includes: obtaining a first sample dataset; inputting at least one first question text corresponding to at least one piece of first sample data into a dialog model separately to obtain at least one first answer prediction result; inputting each second question text into the dialog model to obtain a second answer prediction result output by the dialog model; inputting the second answer prediction result into a reward model to obtain a score of the second answer prediction result output by the reward model; determining a comprehensive loss based on the at least one first answer prediction result, a first answer text of each of the at least one piece of first sample data, and a score corresponding to each of at least one piece of second sample data; and adjusting at least one parameter of the dialog model based on the comprehensive loss.

    MULTI-SYSTEM-BASED INTELLIGENT QUESTION ANSWERING METHOD AND APPARATUS, AND DEVICE

    公开(公告)号:US20220391426A1

    公开(公告)日:2022-12-08

    申请号:US17820285

    申请日:2022-08-17

    Abstract: The present disclosure provides a multi-system-based intelligent question answering method and apparatus, and a device, relating to the field of artificial intelligence, in particular to the field of knowledge graph. The specific implementation solution is: determining a question category of question information in response to a question answering instruction of a user, wherein the question answering instruction is used to indicate the question information; determining a query engine corresponding to the question category, and invoking multiple question analysis systems corresponding to the query engine according to the query engine; and feeding back answer information to the user when the answer information corresponding to the question information is determined according to a current question analysis system in a process of processing the question information by sequentially using the multiple question analysis systems according to system priorities of the question analysis systems.

    MULTIMODAL DATA GENERATION
    10.
    发明申请

    公开(公告)号:US20250094713A1

    公开(公告)日:2025-03-20

    申请号:US18967529

    申请日:2024-12-03

    Abstract: A multimodal data generation method is provided. The method includes: inputting a query data sequence into a multimodal model, to obtain a plurality of tokens in a response data sequence, where a current token is generated through the following operations: inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model generates the current token based on the query data sequence and the current response data sequence, in response to determining that the current token belongs to a first data modality; or inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model denoises an initial token sequence based on the query data sequence and the current response data sequence, to generate a result token sequence, in response to determining that the current token belongs to a second data modality.

Patent Agency Ranking