ATTRIBUTE DIVERSITY FOR FREQUENT PATTERN ANALYSIS

    公开(公告)号:US20210157847A1

    公开(公告)日:2021-05-27

    申请号:US17163081

    申请日:2021-01-29

    Abstract: A data processing server may receive a set of data objects for frequent pattern (FP) analysis. The set of data objects may be analyzed using an attribute diversity technique. For the set of data attributes of the set of data objects, the server may arrange the attributes in one or more dimensions. The server may initialize a set of centroids on data points and identify mean values of nearby data points. Based on an iteration of the mean value calculation, the server may identify a set of attributes corresponding to final mean values as being groups of similarly frequent attributes. These groups of similarly frequent attributes may be analyzed using an FP analysis procedure to identify frequent patterns of data attributes.

    Data shards for distributed processing

    公开(公告)号:US12236264B2

    公开(公告)日:2025-02-25

    申请号:US17163386

    申请日:2021-01-30

    Abstract: Systems, devices, and techniques are disclosed for data shards for distributed processing. Data sets of data for users may be received. The data sets may belong to separate groups. User identifiers in the data sets may be hashed to generate hashed identifiers for the data sets. The user identifiers in the data sets may be replaced with the hashed identifiers. The data sets may be split to generate shards. The data sets may be split into the same number of shards. Merged shards may be generated by merging the shards using a separate running process for each of the merged shards. The merged shards may be generated using shards from more than one of the two or more data sets. An operation may be performed on all of the merged shards.

    LOCALIZATION OF MACHINE LEARNING MODELS TRAINED WITH GLOBAL DATA

    公开(公告)号:US20220207407A1

    公开(公告)日:2022-06-30

    申请号:US17134430

    申请日:2020-12-27

    Abstract: Systems, devices, and techniques are disclosed for localization of machine learning models trained with global data. Data sets of event data for users may be received. The data sets may belong to separate groups. The data sets of event data may be combined to generate a global data set. A matrix factorization model may be trained using the global data set to generate a globally trained matrix factorization model. A localization group data set may be generated including event data from the global data set for users from a first of the groups. The globally trained matrix factorization model may be trained with the localization group data set to generate a localized matrix factorization model for the first of the groups.

    Differential support for frequent pattern analysis

    公开(公告)号:US11275768B2

    公开(公告)日:2022-03-15

    申请号:US16120067

    申请日:2018-08-31

    Abstract: Methods, systems, and devices supporting differential support for frequent pattern (FP) analysis are described. Some database systems may analyze data sets to determine FPs of data attributes within the data sets. However, if data distributions for different types of data attributes vary greatly, more frequent data attribute types may skew the FPs away from the less frequent types. To reduce the noise of common attributes while maintaining sensitivity to the less common attributes, the database system may implement multiple minimum support (e.g., frequency) thresholds. For example, the database system may adaptively categorize the different data attribute types into data categories based on their distributions and may dynamically determine support thresholds for the categories. Using different minimum support thresholds for different data categories allows the system to filter out data attribute patterns based on the distributions of the data attribute types in the pattern.

    Method and system for classifying user identifiers into similar segments

    公开(公告)号:US11061937B2

    公开(公告)日:2021-07-13

    申请号:US16144715

    申请日:2018-09-27

    Abstract: A database system performs lookalike analysis on a data set including a plurality of user identifiers, which are associated with one or more attribute records. The database system classifies the user identifiers into one or more segments of user identifiers based on the attribute records. The database system performs Linear Discriminant Analysis (LDA) to calculate a measure of importance of the attribute records relative to the one or more segments. The database system auto-correlates the attribute records based on the numbers of attribute records in the user identifier population and the one or more segments. The database system identifies a set of user identifiers relative to one or more segments using the measures of importance and the auto-correlated parameters.

    Attribute diversity for frequent pattern analysis

    公开(公告)号:US10963519B2

    公开(公告)日:2021-03-30

    申请号:US16355996

    申请日:2019-03-18

    Abstract: A data processing server may receive a set of data objects for frequent pattern (FP) analysis. The set of data objects may be analyzed using an attribute diversity technique. For the set of data attributes of the set of data objects, the server may arrange the attributes in one or more dimensions. The server may initialize a set of centroids on data points and identify mean values of nearby data points. Based on an iteration of the mean value calculation, the server may identify a set of attributes corresponding to final mean values as being groups of similarly frequent attributes. These groups of similarly frequent attributes may be analyzed using an FP analysis procedure to identify frequent patterns of data attributes.

    Attribute diversity for frequent pattern analysis

    公开(公告)号:US11556595B2

    公开(公告)日:2023-01-17

    申请号:US17163081

    申请日:2021-01-29

    Abstract: A data processing server may receive a set of data objects for frequent pattern (FP) analysis. The set of data objects may be analyzed using an attribute diversity technique. For the set of data attributes of the set of data objects, the server may arrange the attributes in one or more dimensions. The server may initialize a set of centroids on data points and identify mean values of nearby data points. Based on an iteration of the mean value calculation, the server may identify a set of attributes corresponding to final mean values as being groups of similarly frequent attributes. These groups of similarly frequent attributes may be analyzed using an FP analysis procedure to identify frequent patterns of data attributes.

Patent Agency Ranking