-
公开(公告)号:US11972214B2
公开(公告)日:2024-04-30
申请号:US18348317
申请日:2023-07-06
Applicant: ZHEJIANG LAB
Inventor: Jingsong Li , Lixin Shi , Ran Xin , Zongfeng Yang , Yu Tian , Tianshu Zhou
IPC: G06F17/00 , G06F40/169 , G06F40/284 , G06F40/295 , G06F40/30 , G06F40/40
CPC classification number: G06F40/295 , G06F40/169 , G06F40/284 , G06F40/30 , G06F40/40
Abstract: Disclosed is a method and an apparatus NER-orientated Chinese clinical text data augmentation, and unannotated data and annotated data of label linearization processing through data preprocessing. A concealed part is predicted based on retained information by using the unannotated data and concealing part of information in text, and meanwhile an entity word-level discrimination task is introduced for pre-training of a span-based language model; and a plurality of decoding mechanisms are introduced in a fine-tune stage, a relationship between a text vector and text data is obtained based on the pre-trained span-based language model, linearized data with entity labels is converted into the text vector, and text generation is performed through forward decoding and reverse decoding in a prediction stage of a text generation model to obtain enhanced data with annotation information.