-
公开(公告)号:US09619457B1
公开(公告)日:2017-04-11
申请号:US14332996
申请日:2014-07-16
Applicant: Google Inc.
Inventor: Daniel Gillick , Amarnag Subramanya
CPC classification number: G06K9/00442 , G06F17/271 , G06F17/2715 , G06F17/277 , G06K9/00483 , G06K9/6218
Abstract: A computer-implemented technique can include obtaining a training corpus including pairs of (i) documents and (ii) corresponding abstracts. The technique can include identifying a set of entity mentions in each abstract and each corresponding document based on their respective part-of-speech (POS) tags and dependency parses. The technique can include clustering the sets of entity mentions referring to a same underlying entity to obtain clusters for each document and each corresponding abstract. The technique can include aligning specific abstract entity mentions to corresponding document entity mentions to obtain a set of aligned abstract and document entities. The technique can include labeling the set of aligned entities as salient and unaligned entities as non-salient to generate a labeled corpus. The technique can also include training features of a classifier using the labeled corpus to obtain a trained classifier.