UNSUPERVISED DOCUMENT SUMMARIZATION BY ATTENTION AND RECONSTRUCTION

    公开(公告)号:US20200167391A1

    公开(公告)日:2020-05-28

    申请号:US16200872

    申请日:2018-11-27

    Applicant: SAP SE

    Inventor: Xin Zheng Aixin Sun

    Abstract: Methods, systems, and computer-readable storage media for processing, through an encoder of an attention-reconstruction-based summarization (ARS) platform, the primary document to provide a contextual vector, identifying, using reconstruction regularizers of the ARS platform, information that is absent from the one or more linking documents, and is present in the primary document at least partially based on the contextual vector, for each word of the primary document, providing, using an attention mechanism of the ARS platform, a word salience score at least partially based on the information that is absent from the one or more linking documents, for each sentence in the primary document, determining a sentence salience score based on word salience scores of words within the sentence, ranking sentences of the primary document based on sentence salience scores, and selecting two or more sentences of the primary document based on ranking to provide a summary of the primary document.

    Aspect-driven multi-document summarization

    公开(公告)号:US11087089B2

    公开(公告)日:2021-08-10

    申请号:US16152553

    申请日:2018-10-05

    Applicant: SAP SE

    Abstract: Methods, systems, and computer-readable storage media for generating document representations of documents in a set of documents based on sentence embeddings that are provided using a recurrent neural network (RNN) encoder, each document including an electronic document, generating aspect representations based on sentences included in documents of the set of documents, and comment documents in a set of comment documents, determining a first topic representation based on the document representations, determining a second topic representation based on aspect saliences with respect to the first topic representation, calculating salience scores, each salience score associated with a respective sentence, and calculated based on a set of initial salience scores, and a respective aspect salience score, and generating a summary of the set of documents based on the salience scores, the summary including one or more sentences included in documents of the set of documents.

    Unsupervised document summarization by attention and reconstruction

    公开(公告)号:US10831834B2

    公开(公告)日:2020-11-10

    申请号:US16200872

    申请日:2018-11-27

    Applicant: SAP SE

    Inventor: Xin Zheng Aixin Sun

    Abstract: Methods, systems, and computer-readable storage media for processing, through an encoder of an attention-reconstruction-based summarization (ARS) platform, the primary document to provide a contextual vector, identifying, using reconstruction regularizers of the ARS platform, information that is absent from the one or more linking documents, and is present in the primary document at least partially based on the contextual vector, for each word of the primary document, providing, using an attention mechanism of the ARS platform, a word salience score at least partially based on the information that is absent from the one or more linking documents, for each sentence in the primary document, determining a sentence salience score based on word salience scores of words within the sentence, ranking sentences of the primary document based on sentence salience scores, and selecting two or more sentences of the primary document based on ranking to provide a summary of the primary document.

    Collecting event related tweets
    4.
    发明授权

    公开(公告)号:US10229193B2

    公开(公告)日:2019-03-12

    申请号:US15284509

    申请日:2016-10-03

    Applicant: SAP SE

    Inventor: Xin Zheng Aixin Sun

    Abstract: Described herein is a framework for collecting event related tweets. In accordance with one aspect of the framework, an initial set of keywords is constructed from a reference source. Tweets are collected from a messaging stream using the initial set of keywords for a first time window. The collected tweets are filtered to generate a candidate keywords set. The selected tweets of the candidate keywords set are grouped into a plurality of clusters. The clusters are classified into event related and non-event related clusters. The initial set of keywords is updated to obtain a new set of keywords.

Patent Agency Ranking