Parsing of unstructured log data into structured data and creation of schema
Abstract:
Herein are techniques for training a parser by categorizing and generalizing messages and abstracting message templates for parsing after training. In an embodiment, a computer generates a message signature based on a message sequence of tokens that were extracted from a training message. The message signature is matched to a cluster signature that represents messages of one of many clusters that have distinct signatures. The training message is added to the cluster. Based on a data type of the cluster signature, a value is extracted from a second message, such as a live message after training. Fuzzy signatures may be probabilistically matched to select a best matching cluster for a message. The value range of a token may be broadened or narrowed by adding or removing candidate data types, by adding or removing literals to a data type, and/or by promoting a narrow data type to a broader data type.
Information query
Patent Agency Ranking
0/0