-
公开(公告)号:US11755626B1
公开(公告)日:2023-09-12
申请号:US17390289
申请日:2021-07-30
Applicant: SPLUNK Inc.
Inventor: Ningwei Liu , Deepanjan Basu , Todd M. Miller , Craig Morea
CPC classification number: G06F16/285 , G06F16/2237 , G06F16/2264 , G06F16/93
Abstract: A computer-implemented method is disclosed that includes operations of receiving document to be classified, performing pre-processing operations on the document resulting in generation of a tokenized document, performing word embedding operations on the tokenized document resulting in generation of a vectorized document, performing text similarity operations on the vectorized document and each of one or more vectorized topics resulting in a set of one or more similarity scores, wherein a first similarity score indicates a level of similarity between the vectorized document and a first vectorized topic, and wherein each vectorized topic represents one of a predetermined set of topics and classifying the document into one of the predetermined set of topics based on the set of one or more similarity scores. Performing the word embedding operations includes mapping each token of the remaining subset to a multi-dimensional vector, with each multi-dimensional vector representing a semantic meaning of a token.
-
2.
公开(公告)号:US11775767B1
公开(公告)日:2023-10-03
申请号:US17752221
申请日:2022-05-24
Applicant: Splunk, Inc.
Inventor: Ningwei Liu , Wangyan Feng , Aaron Chan , Joel Fulton
IPC: G06F40/30 , G06N5/048 , G06F16/332 , G06F40/284 , G10L15/22 , G06N5/022 , G06F16/9032 , G06F16/33
CPC classification number: G06F40/30 , G06F16/3329 , G06F16/3344 , G06F16/90332 , G06F40/284 , G06N5/022 , G06N5/048 , G10L15/22
Abstract: A computerized method is disclosed including operations of receiving a plurality of request texts, and for each request text of the plurality of request texts: performing a pre-processing operation, performing a first text similarity procedure or a second text similarity procedure that each result in a determination of a most similar request text in a knowledge base, wherein the second text similarity procedure includes performance of word embedding operations, determining a degree of similarity between the current request text and the most similar request text, when the degree of similarity satisfies a similarity threshold comparison, associating an answer associated with the most request text with the current text request, and when the degree of similar does not satisfy the similarity threshold comparison, flagging the current request text or associating a placeholder answer with the current request text. Performing pre-processing may include removing stop words and punctuation and creating tokenized text.
-
公开(公告)号:US11379670B1
公开(公告)日:2022-07-05
申请号:US16588718
申请日:2019-09-30
Applicant: SPLUNK INC.
Inventor: Ningwei Liu , Wangyan Feng , Aaron Chan , Joel Fulton
IPC: G06F40/30 , G06N5/02 , G06N5/04 , G06F40/284 , G06F16/33 , G06F16/332 , G06F16/9032 , G10L15/22
Abstract: A computerized method is disclosed including operations of receiving one or more request texts, including at least a first request text, automatically performing processing on the first request text to determine a most similar request text in a knowledge base, determining a degree of similarity between the first request text and the most similar request text, and in response to a comparison between the degree of similarity and a similarity threshold, retrieving, from the knowledge base, an answer corresponding to the most similar request text. Performing processing may include (i) removing stop words and punctuation and creating tokenized text, (ii) converting the tokenized text into a vector using a trained neural network, and (iii) performing an analysis of the vector with the entries of the knowledge base using one or more of a Word Mover's Distance (WMD) algorithm, or a Soft Cosine Measure (SCM) algorithm.
-
-