DOCUMENT CLASSIFICATION USING ATTENTION NETWORKS

    公开(公告)号:US20200210526A1

    公开(公告)日:2020-07-02

    申请号:US16237853

    申请日:2019-01-02

    Applicant: NETAPP, INC.

    Abstract: A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.

    Document classification using attention networks

    公开(公告)号:US10824815B2

    公开(公告)日:2020-11-03

    申请号:US16237853

    申请日:2019-01-02

    Applicant: NETAPP, INC.

    Abstract: A system comprising at least one hardware processor; and a non-transitory computer-readable storage medium having stored thereon program instructions, the program instructions executable by the at least one hardware processor to: receive, as input, a plurality of electronic documents, apply a trained machine learning classifier to automatically classify at least some of said plurality of electronic documents, wherein said machine learning classifier comprises two or more attention layers, and wherein at least one of the attention layers comprises an adjustable parameter which controls a distribution of attention weights assigned by said attention layer.

    Token matching in large document corpora

    公开(公告)号:US10796092B2

    公开(公告)日:2020-10-06

    申请号:US16271839

    申请日:2019-02-10

    Applicant: NETAPP, INC.

    Inventor: Guy Leibovitz

    Abstract: A method comprising receiving a dictionary comprising a plurality of entities, wherein each entity has a length of between 1 and n tokens; constructing a probabilistic data representation model comprising n Bloom filter (BF) pairs indexed from 1 to n; populating said probabilistic data representation model with a data representation of said entities, wherein, with respect to each BF pair indexed i: (i) a first BF is populated with the first i tokens of all said entities having at least i+1 tokens, and (ii) a second BF in populated with all said entities having exactly i tokens; receiving a text corpus, wherein said text corpus is segmented into tokens; and automatically matching each token in said text corpus against said populated probabilistic data representation model, wherein said matching comprises sequentially querying each said BF pair in the order of said indexing, to determine a match.

Patent Agency Ranking