-
公开(公告)号:US09773166B1
公开(公告)日:2017-09-26
申请号:US14931576
申请日:2015-11-03
Applicant: Google Inc.
Inventor: Miriam King Connor , Isabelle L. Stanton , Amarnag Subramanya
CPC classification number: G06F17/271 , G06F17/30705 , G06N99/005
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying documents. One of the methods includes obtaining a collection of training documents, the training documents including positive documents identified as being longform documents and negative documents identified as not being longform documents; extracting one or more features from the training documents, wherein the features represent lexical or textual content of the training documents; and generating a longform document classifier trained using feature instances extracted from the training documents, wherein the generated longform document classifier is trained such that input documents are classified as being longform documents or classified as not being longform documents.