PARALLEL CORPUS CONSTRUCTION PROGRAM, PARALLEL CORPUS CONSTRUCTION METHOD, AND INFORMATION PROCESSING APPARATUS

    公开(公告)号:EP4394648A1

    公开(公告)日:2024-07-03

    申请号:EP23206653.0

    申请日:2023-10-30

    申请人: FUJITSU LIMITED

    发明人: NGUYEN, An Le

    摘要: An information processing apparatus acquires a first parallel corpus in which a first sentence, which includes a first named entity in a first language, and a second sentence, which includes a second named entity in a second language corresponding to the first named entity, are associated, extracts a third named entity whose degree of similarity with the first named entity exceeds a threshold from first dictionary data including a plurality of named entities in the first language, specifies a fourth named entity corresponding to the third named entity using second dictionary data indicating correspondence between named entities in the first language and named entities in the second language, and generates a second parallel corpus by replacing the first named entity included in the first sentence with the third named entity and replacing the second named entity included in the second sentence with the fourth named entity.