Systems and methods for classifying data objects

    公开(公告)号:US11755626B1

    公开(公告)日:2023-09-12

    申请号:US17390289

    申请日:2021-07-30

    Applicant: SPLUNK Inc.

    CPC classification number: G06F16/285 G06F16/2237 G06F16/2264 G06F16/93

    Abstract: A computer-implemented method is disclosed that includes operations of receiving document to be classified, performing pre-processing operations on the document resulting in generation of a tokenized document, performing word embedding operations on the tokenized document resulting in generation of a vectorized document, performing text similarity operations on the vectorized document and each of one or more vectorized topics resulting in a set of one or more similarity scores, wherein a first similarity score indicates a level of similarity between the vectorized document and a first vectorized topic, and wherein each vectorized topic represents one of a predetermined set of topics and classifying the document into one of the predetermined set of topics based on the set of one or more similarity scores. Performing the word embedding operations includes mapping each token of the remaining subset to a multi-dimensional vector, with each multi-dimensional vector representing a semantic meaning of a token.

Patent Agency Ranking