Parsing of unstructured log data into structured data and creation of schema

    公开(公告)号:US11372868B2

    公开(公告)日:2022-06-28

    申请号:US16246765

    申请日:2019-01-14

    Abstract: Herein are techniques for training a parser by categorizing and generalizing messages and abstracting message templates for parsing after training. In an embodiment, a computer generates a message signature based on a message sequence of tokens that were extracted from a training message. The message signature is matched to a cluster signature that represents messages of one of many clusters that have distinct signatures. The training message is added to the cluster. Based on a data type of the cluster signature, a value is extracted from a second message, such as a live message after training. Fuzzy signatures may be probabilistically matched to select a best matching cluster for a message. The value range of a token may be broadened or narrowed by adding or removing candidate data types, by adding or removing literals to a data type, and/or by promoting a narrow data type to a broader data type.

Patent Agency Ranking