Aspect-driven multi-document summarization

    公开(公告)号:US11087089B2

    公开(公告)日:2021-08-10

    申请号:US16152553

    申请日:2018-10-05

    Applicant: SAP SE

    Abstract: Methods, systems, and computer-readable storage media for generating document representations of documents in a set of documents based on sentence embeddings that are provided using a recurrent neural network (RNN) encoder, each document including an electronic document, generating aspect representations based on sentences included in documents of the set of documents, and comment documents in a set of comment documents, determining a first topic representation based on the document representations, determining a second topic representation based on aspect saliences with respect to the first topic representation, calculating salience scores, each salience score associated with a respective sentence, and calculated based on a set of initial salience scores, and a respective aspect salience score, and generating a summary of the set of documents based on the salience scores, the summary including one or more sentences included in documents of the set of documents.

    Unsupervised document summarization by attention and reconstruction

    公开(公告)号:US10831834B2

    公开(公告)日:2020-11-10

    申请号:US16200872

    申请日:2018-11-27

    Applicant: SAP SE

    Inventor: Xin Zheng Aixin Sun

    Abstract: Methods, systems, and computer-readable storage media for processing, through an encoder of an attention-reconstruction-based summarization (ARS) platform, the primary document to provide a contextual vector, identifying, using reconstruction regularizers of the ARS platform, information that is absent from the one or more linking documents, and is present in the primary document at least partially based on the contextual vector, for each word of the primary document, providing, using an attention mechanism of the ARS platform, a word salience score at least partially based on the information that is absent from the one or more linking documents, for each sentence in the primary document, determining a sentence salience score based on word salience scores of words within the sentence, ranking sentences of the primary document based on sentence salience scores, and selecting two or more sentences of the primary document based on ranking to provide a summary of the primary document.

    Document representation for machine-learning document classification

    公开(公告)号:US10482118B2

    公开(公告)日:2019-11-19

    申请号:US15623071

    申请日:2017-06-14

    Applicant: SAP SE

    Inventor: Xin Zheng

    Abstract: Methods, systems, and computer-readable storage media for providing weighted vector representations of documents, with actions including receiving text data, the text data including a plurality of documents, each document including a plurality of words, processing the text data to provide a plurality of word-vectors, each word-vector being based on a respective word of the plurality of words, determining a plurality of similarity scores based on the plurality of word-vectors, each similarity score representing a degree of similarity between word-vectors, grouping words of the plurality of words into clusters based on the plurality of similarity scores, each cluster including two or more words of the plurality of words, and providing a document representation for each document in the plurality of documents, each document representation including a feature vector, each feature corresponding to a cluster.

    Position-dependent word salience estimation

    公开(公告)号:US10346524B1

    公开(公告)日:2019-07-09

    申请号:US15940041

    申请日:2018-03-29

    Applicant: SAP SE

    Inventor: Xin Zheng

    Abstract: Methods, systems, and computer-readable storage media for receiving two or more electronic documents, each electronic document including text data, a second electronic document including a link to a first electronic document, processing word representations of words of the first electronic document using a first encoder to provide first output and a context vector, processing text data of the second electronic document and the context vector using a first decoder to provide second output, determining, by an attention mechanism, a plurality of weights for each word in the text data of the first electronic document based on the first output, and the second output, and providing a word salience value for each word, a word salience value comprising a sum of weights of a respective word.

    Collecting event related tweets
    5.
    发明授权

    公开(公告)号:US10229193B2

    公开(公告)日:2019-03-12

    申请号:US15284509

    申请日:2016-10-03

    Applicant: SAP SE

    Inventor: Xin Zheng Aixin Sun

    Abstract: Described herein is a framework for collecting event related tweets. In accordance with one aspect of the framework, an initial set of keywords is constructed from a reference source. Tweets are collected from a messaging stream using the initial set of keywords for a first time window. The collected tweets are filtered to generate a candidate keywords set. The selected tweets of the candidate keywords set are grouped into a plurality of clusters. The clusters are classified into event related and non-event related clusters. The initial set of keywords is updated to obtain a new set of keywords.

    DOCUMENT REPRESENTATION FOR MACHINE-LEARNING DOCUMENT CLASSIFICATION

    公开(公告)号:US20180365248A1

    公开(公告)日:2018-12-20

    申请号:US15623071

    申请日:2017-06-14

    Applicant: SAP SE

    Inventor: Xin Zheng

    Abstract: Methods, systems, and computer-readable storage media for providing weighted vector representations of documents, with actions including receiving text data, the text data including a plurality of documents, each document including a plurality of words, processing the text data to provide a plurality of word-vectors, each word-vector being based on a respective word of the plurality of words, determining a plurality of similarity scores based on the plurality of word-vectors, each similarity score representing a degree of similarity between word-vectors, grouping words of the plurality of words into clusters based on the plurality of similarity scores, each cluster including two or more words of the plurality of words, and providing a document representation for each document in the plurality of documents, each document representation including a feature vector, each feature corresponding to a cluster.

    UNSUPERVISED DOCUMENT SUMMARIZATION BY ATTENTION AND RECONSTRUCTION

    公开(公告)号:US20200167391A1

    公开(公告)日:2020-05-28

    申请号:US16200872

    申请日:2018-11-27

    Applicant: SAP SE

    Inventor: Xin Zheng Aixin Sun

    Abstract: Methods, systems, and computer-readable storage media for processing, through an encoder of an attention-reconstruction-based summarization (ARS) platform, the primary document to provide a contextual vector, identifying, using reconstruction regularizers of the ARS platform, information that is absent from the one or more linking documents, and is present in the primary document at least partially based on the contextual vector, for each word of the primary document, providing, using an attention mechanism of the ARS platform, a word salience score at least partially based on the information that is absent from the one or more linking documents, for each sentence in the primary document, determining a sentence salience score based on word salience scores of words within the sentence, ranking sentences of the primary document based on sentence salience scores, and selecting two or more sentences of the primary document based on ranking to provide a summary of the primary document.

Patent Agency Ranking