-
公开(公告)号:US11087089B2
公开(公告)日:2021-08-10
申请号:US16152553
申请日:2018-10-05
Applicant: SAP SE
Inventor: Xin Zheng , Karthik Muthuswamy , Aixin Sun
IPC: G06F17/00 , G06F40/30 , G06N3/04 , G06F16/93 , G06F40/166 , G06F40/169 , G06F40/14 , G06F40/137
Abstract: Methods, systems, and computer-readable storage media for generating document representations of documents in a set of documents based on sentence embeddings that are provided using a recurrent neural network (RNN) encoder, each document including an electronic document, generating aspect representations based on sentences included in documents of the set of documents, and comment documents in a set of comment documents, determining a first topic representation based on the document representations, determining a second topic representation based on aspect saliences with respect to the first topic representation, calculating salience scores, each salience score associated with a respective sentence, and calculated based on a set of initial salience scores, and a respective aspect salience score, and generating a summary of the set of documents based on the salience scores, the summary including one or more sentences included in documents of the set of documents.
-
公开(公告)号:US10831834B2
公开(公告)日:2020-11-10
申请号:US16200872
申请日:2018-11-27
Applicant: SAP SE
Abstract: Methods, systems, and computer-readable storage media for processing, through an encoder of an attention-reconstruction-based summarization (ARS) platform, the primary document to provide a contextual vector, identifying, using reconstruction regularizers of the ARS platform, information that is absent from the one or more linking documents, and is present in the primary document at least partially based on the contextual vector, for each word of the primary document, providing, using an attention mechanism of the ARS platform, a word salience score at least partially based on the information that is absent from the one or more linking documents, for each sentence in the primary document, determining a sentence salience score based on word salience scores of words within the sentence, ranking sentences of the primary document based on sentence salience scores, and selecting two or more sentences of the primary document based on ranking to provide a summary of the primary document.
-
公开(公告)号:US10482118B2
公开(公告)日:2019-11-19
申请号:US15623071
申请日:2017-06-14
Applicant: SAP SE
Inventor: Xin Zheng
Abstract: Methods, systems, and computer-readable storage media for providing weighted vector representations of documents, with actions including receiving text data, the text data including a plurality of documents, each document including a plurality of words, processing the text data to provide a plurality of word-vectors, each word-vector being based on a respective word of the plurality of words, determining a plurality of similarity scores based on the plurality of word-vectors, each similarity score representing a degree of similarity between word-vectors, grouping words of the plurality of words into clusters based on the plurality of similarity scores, each cluster including two or more words of the plurality of words, and providing a document representation for each document in the plurality of documents, each document representation including a feature vector, each feature corresponding to a cluster.
-
公开(公告)号:US10346524B1
公开(公告)日:2019-07-09
申请号:US15940041
申请日:2018-03-29
Applicant: SAP SE
Inventor: Xin Zheng
Abstract: Methods, systems, and computer-readable storage media for receiving two or more electronic documents, each electronic document including text data, a second electronic document including a link to a first electronic document, processing word representations of words of the first electronic document using a first encoder to provide first output and a context vector, processing text data of the second electronic document and the context vector using a first decoder to provide second output, determining, by an attention mechanism, a plurality of weights for each word in the text data of the first electronic document based on the first output, and the second output, and providing a word salience value for each word, a word salience value comprising a sum of weights of a respective word.
-
公开(公告)号:US10229193B2
公开(公告)日:2019-03-12
申请号:US15284509
申请日:2016-10-03
Applicant: SAP SE
Abstract: Described herein is a framework for collecting event related tweets. In accordance with one aspect of the framework, an initial set of keywords is constructed from a reference source. Tweets are collected from a messaging stream using the initial set of keywords for a first time window. The collected tweets are filtered to generate a candidate keywords set. The selected tweets of the candidate keywords set are grouped into a plurality of clusters. The clusters are classified into event related and non-event related clusters. The initial set of keywords is updated to obtain a new set of keywords.
-
公开(公告)号:US20180365248A1
公开(公告)日:2018-12-20
申请号:US15623071
申请日:2017-06-14
Applicant: SAP SE
Inventor: Xin Zheng
CPC classification number: G06F16/355 , G06F16/358 , G06F17/2715 , G06F17/277 , G06F17/2785 , G06F17/28 , G06N20/00
Abstract: Methods, systems, and computer-readable storage media for providing weighted vector representations of documents, with actions including receiving text data, the text data including a plurality of documents, each document including a plurality of words, processing the text data to provide a plurality of word-vectors, each word-vector being based on a respective word of the plurality of words, determining a plurality of similarity scores based on the plurality of word-vectors, each similarity score representing a degree of similarity between word-vectors, grouping words of the plurality of words into clusters based on the plurality of similarity scores, each cluster including two or more words of the plurality of words, and providing a document representation for each document in the plurality of documents, each document representation including a feature vector, each feature corresponding to a cluster.
-
公开(公告)号:US11416680B2
公开(公告)日:2022-08-16
申请号:US15241040
申请日:2016-08-18
Applicant: SAP SE
Inventor: Danqing Cai , Wei Tah Chai , Pek Gnee Ng , Subashini Rengarajan , Xin Zheng , Hang Guo , Weile Chen
IPC: G06F40/279 , G06F16/35
Abstract: Described herein is a framework for classifying social media inputs. In accordance with one aspect of the framework, one or more social media inputs is acquired from one or more social media platforms. The social media inputs are cleaned to remove redundant elements. One or more features are extracted from the cleaned social media inputs. The social media inputs are classified by a trained classifier into predefined categories using the extracted one or more features.
-
公开(公告)号:US20200167391A1
公开(公告)日:2020-05-28
申请号:US16200872
申请日:2018-11-27
Applicant: SAP SE
Abstract: Methods, systems, and computer-readable storage media for processing, through an encoder of an attention-reconstruction-based summarization (ARS) platform, the primary document to provide a contextual vector, identifying, using reconstruction regularizers of the ARS platform, information that is absent from the one or more linking documents, and is present in the primary document at least partially based on the contextual vector, for each word of the primary document, providing, using an attention mechanism of the ARS platform, a word salience score at least partially based on the information that is absent from the one or more linking documents, for each sentence in the primary document, determining a sentence salience score based on word salience scores of words within the sentence, ranking sentences of the primary document based on sentence salience scores, and selecting two or more sentences of the primary document based on ranking to provide a summary of the primary document.
-
-
-
-
-
-
-