ENCODING TEXTUAL INFORMATION FOR TEXT ANALYSIS

    公开(公告)号:US20200334410A1

    公开(公告)日:2020-10-22

    申请号:US16846756

    申请日:2020-04-13

    摘要: A computer-implemented method of encoding a word for use in a method of text analysis comprises receiving input text to be analysed, the input text comprising a first word which is not represented in a vocabulary set stored on a storage. The vocabulary set comprises a plurality of words and an associated word embedding vector for each word in the set. The method comprises identifying the first word as a word which is not represented in the vocabulary set and determining one or more sub-words within the first word with which to encode the first word. Each of the one or more sub-words corresponds with a word represented in the vocabulary set and having an embedding vector in the vocabulary set. The method comprises determining an encoding for the first word based on the one or more sub-words.

    METHOD AND APPARATUS FOR SEGMENTING A MEDICAL TEXT REPORT INTO SECTIONS

    公开(公告)号:US20220398374A1

    公开(公告)日:2022-12-15

    申请号:US17713673

    申请日:2022-04-05

    摘要: A framework for segmenting a medical text report into sections is disclosed. For each sentence of the report, a first sentence representation is determined by inputting a word-level context representation for each sentence sequentially into a neural network. A second sentence representation is determined by inputting an aggregated representation for each sentence sequentially into another neural network. For each sentence, a third sentence representation is determined based on a combination of the first and second sentence representations, and a section classification for the sentence is determined by inputting the third sentence representation into a section classifier. Each sentence is assigned the section classification determined for the sentence.