Identifying longform articles
    1.
    发明授权

    公开(公告)号:US09773166B1

    公开(公告)日:2017-09-26

    申请号:US14931576

    申请日:2015-11-03

    Applicant: Google Inc.

    CPC classification number: G06F17/271 G06F17/30705 G06N99/005

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying documents. One of the methods includes obtaining a collection of training documents, the training documents including positive documents identified as being longform documents and negative documents identified as not being longform documents; extracting one or more features from the training documents, wherein the features represent lexical or textual content of the training documents; and generating a longform document classifier trained using feature instances extracted from the training documents, wherein the generated longform document classifier is trained such that input documents are classified as being longform documents or classified as not being longform documents.

Patent Agency Ranking