METHOD AND APPARATUS FOR SEQUENCE LABELING ON ENTITY TEXT, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

    公开(公告)号:US20220164536A1

    公开(公告)日:2022-05-26

    申请号:US17455967

    申请日:2021-11-22

    IPC分类号: G06F40/295

    摘要: A method and an apparatus for sequence labeling on an entity text, and a non-transitory computer-readable recording medium are provided. In the method, a start position of an entity text within a target text is determined. Then, a first matrix is generated based on the start position of the entity text. Elements in the first matrix indicates focusable weights of each word with respect to other words in the target text. Then, a named entity recognition model is generated using the first matrix. The named entity recognition model is obtained by training using first training data, the first training data includes word embeddings corresponding to respective texts in a training text set, and the texts are texts whose entity label has been labeled. Then, the target text is input to the named entity recognition model, and probability distribution of the entity label is output.

    METHOD AND APPARATUS FOR NAMED ENTITY RECOGNITION, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

    公开(公告)号:US20230394240A1

    公开(公告)日:2023-12-07

    申请号:US18326292

    申请日:2023-05-31

    IPC分类号: G06F40/295 G06F40/40

    CPC分类号: G06F40/295 G06F40/40

    摘要: A method and an apparatus for named entity recognition, and a non-transitory computer-readable recording medium are provided. In the method, text elements are traversed according to a text span to obtain candidate entity words. Then, a class to which the candidate entity word belongs is recognized. The recognizing of the class includes generating a prompt template corresponding to the candidate entity word, and concatenating the text to be recognized and the prompt template to obtain a concatenated text; generating vector representations of the text elements in the concatenated text; generating the vector representation of the candidate entity word according to the vector representations of the text elements of each candidate entity word in the concatenated text, and the vector representation of the text element of the mask word; and classifying the vector representation of the candidate entity word to obtain the class of the candidate entity word.