Epsilon-closure for frequent pattern analysis

    公开(公告)号:US11366821B2

    公开(公告)日:2022-06-21

    申请号:US16119960

    申请日:2018-08-31

    Abstract: Methods, systems, and devices supporting epsilon (ε)-closure for frequent pattern (FP) analysis are described. Some database systems may analyze data sets to determine FPs. In some cases, the FP set may include a large number of semi-redundant patterns, resulting in significant memory or processing overhead. To reduce the redundancy of these patterns, the database system may implement pre-configured or dynamic threshold occurrence differences (e.g., ε values) to test against related patterns. For example, the database system may calculate the difference between the data objects covered by a sub-pattern and a super-pattern (e.g., where the super-pattern includes all the same data attributes of the sub-pattern, plus one additional attribute). This difference may be compared to a corresponding ε value, and if the difference is less than the ε value, the database system may remove one of the patterns (e.g., the sub-pattern) from the set of valid FPs to limit redundancy.

    Data attribution using frequent pattern analysis

    公开(公告)号:US11294917B2

    公开(公告)日:2022-04-05

    申请号:US16156018

    申请日:2018-10-10

    Abstract: Methods, systems, and devices for data attribution using frequent pattern analysis are described. In some cases, data stored at a multi-tenant database server may be analyzed to understand various interactions and patterns between data attributes associated with multiple users. The multi-tenant database server may effectively cluster and/or perform calculations on attributes of the data to understand user patterns. In some examples, the multi-tenant database server may determine a change (e.g., a probability change) in the user patterns by removing one or more attributes from the data set and re-performing the analysis. By re-performing the analysis, the multi-tenant database server may attribute a value to individual pieces and combinations of the data in order to indicate the effect that each piece of data has on the analysis.

    Differential support for frequent pattern analysis

    公开(公告)号:US11275768B2

    公开(公告)日:2022-03-15

    申请号:US16120067

    申请日:2018-08-31

    Abstract: Methods, systems, and devices supporting differential support for frequent pattern (FP) analysis are described. Some database systems may analyze data sets to determine FPs of data attributes within the data sets. However, if data distributions for different types of data attributes vary greatly, more frequent data attribute types may skew the FPs away from the less frequent types. To reduce the noise of common attributes while maintaining sensitivity to the less common attributes, the database system may implement multiple minimum support (e.g., frequency) thresholds. For example, the database system may adaptively categorize the different data attribute types into data categories based on their distributions and may dynamically determine support thresholds for the categories. Using different minimum support thresholds for different data categories allows the system to filter out data attribute patterns based on the distributions of the data attribute types in the pattern.

    Method and system for classifying user identifiers into similar segments

    公开(公告)号:US11061937B2

    公开(公告)日:2021-07-13

    申请号:US16144715

    申请日:2018-09-27

    Abstract: A database system performs lookalike analysis on a data set including a plurality of user identifiers, which are associated with one or more attribute records. The database system classifies the user identifiers into one or more segments of user identifiers based on the attribute records. The database system performs Linear Discriminant Analysis (LDA) to calculate a measure of importance of the attribute records relative to the one or more segments. The database system auto-correlates the attribute records based on the numbers of attribute records in the user identifier population and the one or more segments. The database system identifies a set of user identifiers relative to one or more segments using the measures of importance and the auto-correlated parameters.

    Probabilistic framework for determining device associations

    公开(公告)号:US11030545B2

    公开(公告)日:2021-06-08

    申请号:US16172288

    申请日:2018-10-26

    Abstract: Methods, systems, and devices for determining device associations are described. Some database systems may store information related to device characteristics. Each of these devices may be operated by one or more users, and each user may operate one or more devices. In some cases, information about users may be more valuable than information about devices. As such, a system may determine probable associations between devices, where an association can correspond to operation by a same user. To determine device associations, the system may perform a machine-learning process (e.g., using probabilistic soft logic (PSL) and a hinge-loss Markov Random Field (HL-MRF) model) on input device characteristics and connection information to generate a probability density function. The probability density function may indicate associations between devices within the system. Based on one or more thresholds, the system may determine sets of associated devices and may transmit this association information for analysis or display.

Patent Agency Ranking