Method and Apparatus for Selecting Sample Corpus Used to Optimize Translation Model

    公开(公告)号:US20230140997A1

    公开(公告)日:2023-05-11

    申请号:US18089392

    申请日:2022-12-27

    CPC classification number: G06F40/58

    Abstract: A method and apparatus for selecting a sample corpus used to optimize a translation model, an electronic device, a computer readable storage medium, and a computer program product are provided. The method includes: after acquiring a first corpus, translating the first corpus by using a to-be-optimized translation model to acquire a second corpus with different types of languages, then translating the second corpus by using the to-be-optimized translation model to acquire a third corpus, then determining a difficulty level of the first corpus based on a similarity between the first corpus and the third corpus, and finally determining the first corpus as a sample corpus used to perform optimization training on the to-be-optimized translation model in response to the difficulty level satisfying requirements of a difficulty level threshold.

    Method for Evaluating Text Content, and Related Apparatus

    公开(公告)号:US20230196026A1

    公开(公告)日:2023-06-22

    申请号:US18109813

    申请日:2023-02-14

    CPC classification number: G06F40/30 G06F40/289

    Abstract: A method for evaluating a text content, which may include: after splitting a to-be-evaluated text into a plurality of clauses arranged in sequence according to punctuation information of the to-be-evaluated text, determining a first clause of the plurality of clauses as an actual tune name; then, determining actual prosodic information based on a Chinese phonetic alphabet text of a third clause to a last clause in response to that a number of clauses, whose numbers of Chinese characters satisfy character count requirements of clauses corresponding to the actual tune name, from the third clause to the last clause exceeds a number threshold; and finally, in response to the actual prosodic information being consistent with a standard prosodic information of the actual tune name, evaluating the to-be-evaluated text as a Ci-poetry text.

    TRANSLATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20230153548A1

    公开(公告)日:2023-05-18

    申请号:US17885152

    申请日:2022-08-10

    CPC classification number: G06F40/58

    Abstract: A translation method, an electronic device and a storage medium, which relate to the field of artificial intelligence technologies, such as machine learning technologies, information processing technologies, are disclosed. An implementation includes: acquiring an intermediate translation result generated by each of multiple pre-trained translation models for a to-be-translated specified sentence in a same iteration of a translation process, so as to obtain multiple intermediate translation results; acquiring a co-occurrence word based on the multiple intermediate translation results; and acquiring a target translation result of the specified sentence based on the co-occurrence word.

    TRAINING METHOD, TEXT TRANSLATION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20230076471A1

    公开(公告)日:2023-03-09

    申请号:US17982965

    申请日:2022-11-08

    Abstract: A training method, a text translation method, an electronic device, and a storage medium, which relate to a field of artificial intelligence, in particular to fields of natural language processing and deep learning technologies. A specific implementation solution includes: performing a feature extraction on source sample text data to obtain a sample feature vector sequence; obtaining a target sample feature vector according to the sample feature vector sequence; performing an autoregressive decoding and a non-autoregressive decoding on the sample feature vector sequence, respectively; performing a length prediction on the target sample feature vector; training a predetermined model by using translation sample data, the autoregressive text translation result, the non-autoregressive text translation result, a true length value of the source sample text, the first predicted length value, a true length value of the translation sample text, and the second predicted length value to obtain the text translation model.

    METHOD FOR TRAINING NON-AUTOREGRESSIVE TRANSLATION MODEL

    公开(公告)号:US20230051373A1

    公开(公告)日:2023-02-16

    申请号:US17974317

    申请日:2022-10-26

    Abstract: A method for training a non-autoregressive translation (NAT) model includes: acquiring a source language text, a target language text corresponding to the source language text and a target length of the target language text; generating a target language prediction text and a prediction length by inputting the source language text into the NAT model, in which initialization parameters of the NAT model are determined based on parameters of a pre-trained translation model; and obtaining a target NAT model by training the NAT model based on the target language text, the target language prediction text, the target length and the prediction length.

Patent Agency Ranking