OPTIMIZED SUBSET PROCESSING FOR DE-DUPLICATION

    公开(公告)号:US20170242891A1

    公开(公告)日:2017-08-24

    申请号:US15052556

    申请日:2016-02-24

    CPC classification number: G06F16/24556 G06F7/32 G06F16/2455 G06F16/285

    Abstract: Some embodiments of the present invention include a method for identifying duplicate records from a group of records in a database system. The method includes generating a cluster of records from a group of records based on one or more keys; splitting the cluster of records into multiple subsets of records with each subset of records having fewer number of records than the cluster of records, wherein the splitting the cluster of records into multiple subsets of records is based on a number of records in the cluster of records exceeding a threshold; causing duplicate sets of records in each of the subsets of records to be identified, wherein a duplicate set of records includes one or more records, and wherein when a duplicate set of records includes two or more records, the two or more records are duplicates of one another; merging all of the duplicate sets of records identified from the multiple subsets of records forming a first group of duplicate sets of records; and forming a representative set of records based on selecting a representative record from each of the duplicate sets in the first group of duplicate sets of records.

    SYSTEM AND METHOD FOR USING A STATISTICAL CLASSIFIER TO SCORE CONTACT ENTITIES
    2.
    发明申请
    SYSTEM AND METHOD FOR USING A STATISTICAL CLASSIFIER TO SCORE CONTACT ENTITIES 有权
    使用统计分类器对联系实体进行分类的系统和方法

    公开(公告)号:US20130166489A1

    公开(公告)日:2013-06-27

    申请号:US13773141

    申请日:2013-02-21

    CPC classification number: G06N5/02 G06F17/30985 G06N7/005 G06Q30/02

    Abstract: A system and method for associating a character string with one or more defined entities of a contact record. An input character string is received. The string is first evaluated to see if the structure of the string is recognized. If not, then the string is compared to entries in a look up table. If the string format is not recognized, and the string is not found in the look up table, then a posterior probability is calculated for a set of defined entities over a limited set of string processing features. The result of probabilistic scoring determines which of the defined entities to associate with the character string.

    Abstract translation: 用于将字符串与联系人记录的一个或多个定义的实体相关联的系统和方法。 接收输入字符串。 首先评估字符串以查看是否识别字符串的结构。 如果没有,则将字符串与查找表中的条目进行比较。 如果字符串格式不被识别,并且在查找表中没有找到该字符串,则在一组有限的字符串处理特征中为一组定义的实体计算后验概率。 概率评分的结果决定了与字符串相关联的哪些定义的实体。

    RECOMMENDING DATA PROVIDERS' DATASETS BASED ON DATABASE VALUE DENSITIES

    公开(公告)号:US20180373732A1

    公开(公告)日:2018-12-27

    申请号:US15631306

    申请日:2017-06-23

    Abstract: Recommending data providers' datasets based on database value densities is described. A database system determines a provider dataset density for a value by identifying a frequency of the value in a dataset that is provided by a data provider. The database system determines a user database density for the value by identifying a frequency of the value in a database used by a data user. The database system determines a relative density based on a relationship between the provider dataset density and the user database density. The database system determines an evaluation metric for the value, based on a combination of the relative density and the user database density. The database system causes a recommendation to be outputted, based on a relationship of the evaluation metric relative to other evaluation metrics for other values, which recommends that the data user acquire at least a part of the dataset.

    INTRUSION DETECTION BASED ON LOGIN ATTEMPTS
    4.
    发明申请

    公开(公告)号:US20180205748A1

    公开(公告)日:2018-07-19

    申请号:US15408483

    申请日:2017-01-18

    CPC classification number: H04L63/1416 H04L63/083 H04L63/102

    Abstract: An attempt by a user to login to a destination server is identified from a source server. A destination score is determined based on the count of attempts by the user to login to the destination server and the count of attempts by the user to login to all destination servers. A source given destination score is determined based on the count of attempts by the user to login from the source server to the destination server, and the count of attempts by the user to login to the destination server. An outlier score is determined based on values associated with the destination score and the source given destination score. An alert is output if the outlier score satisfies a threshold.

    USER SCORES BASED ON BULK RECORD UPDATES
    5.
    发明申请

    公开(公告)号:US20160125442A1

    公开(公告)日:2016-05-05

    申请号:US14529413

    申请日:2014-10-31

    CPC classification number: G06Q30/0213

    Abstract: User scores based on bulk record updates is described. A system receives record updates submitted by a user. The system subtracts a penalty debit from a user score, which corresponds to the user, for each record which corresponds to at least one of the record updates and which is removed from purchasing availability. The system adds a full credit to the user score for each record which corresponds to at least one of the record updates and which is purchased. The system adds a partial credit to the user score for each record which corresponds to at least one of the record updates and which is yet to be purchased and which is yet to be removed from purchasing availability, wherein the partial credit is a positive value that is less than the full credit. The system enables the user to access records, based on the user score.

    AUGMENTING MATCH INDICES
    6.
    发明申请

    公开(公告)号:US20180165354A1

    公开(公告)日:2018-06-14

    申请号:US15590371

    申请日:2017-05-09

    CPC classification number: G06F16/31 G06F16/90335

    Abstract: System creates three tries based on values stored in first three fields by records. System associates node in third trie with record, based on value stored in third field by record. System associates node with first dispersion measure, based on values stored in first field by records associated with node, and with second dispersion measure, based on values stored in second field by records associated with node. System identifies branch sequence in third trie as key for prospective record, based on value stored in third field by prospective record. System uses key to identify a subset of records that match prospective record. If a count of the subset exceeds threshold, the system identifies other branch sequence in first trie or second trie as other key for prospective record, based on first dispersion measure and second dispersion measure. System uses the key and the other key to identify at least one record that matches prospective record.

    OPTIMIZED MATCH KEYS FOR FIELDS WITH PREFIX STRUCTURE

    公开(公告)号:US20180165294A1

    公开(公告)日:2018-06-14

    申请号:US15374924

    申请日:2016-12-09

    CPC classification number: G06F16/1727 G06F16/164 G06F16/9027

    Abstract: The system tokenizes values stored by records' fields, creates trie from tokenized values, each branch labeled with tokenized value, each node storing count indicating number of records associated with tokenized value sequence beginning from trie root. The system tokenizes value stored by record field, identifies nodes, beginning from trie root, corresponding to token value sequence associated with tokenized value, until node is identified that stores count that is less than node threshold. The system identifies branch sequence comprising each identified node as record's key, and associates key with node storing count less than node threshold, and record with key. The system tokenizes prospective value stored by prospective record's field, identifies nodes, beginning from trie root, corresponding to another token value sequence associated with tokenized prospective value, until another node is identified that stores another count that is less than node threshold. The system identifies other node's key as prospective record's key, identifies existing record that matches prospective record by using prospective record's key.

    TRANSFORMING COLUMNS FROM SOURCE FILES TO TARGET FILES
    8.
    发明申请
    TRANSFORMING COLUMNS FROM SOURCE FILES TO TARGET FILES 审中-公开
    从源文件转换到目标文件

    公开(公告)号:US20170060919A1

    公开(公告)日:2017-03-02

    申请号:US14840547

    申请日:2015-08-31

    Abstract: Transforming columns from source files to target files is described. A system associates a source column in a source file with an entity of multiple entities associated with target columns comprising a target file, based on a first set of features that describes contents of cells of a first source column that is adjacent to the source column, a second set of features that describes contents of cells of a second source column that is adjacent to the source column, and a third set of features that describes contents of cells of the source column. The system creates a mapping of the source column to a target column associated with the entity, and transforms the mapped source column to the target column in accord with the mapping.

    Abstract translation: 描述从源文件到目标文件的列转换。 基于描述与源列相邻的第一源列的单元的内容的第一组特征,系统将源文件中的源列与与包括目标文件的目标列相关联的多个实体的实体相关联, 描述与源列相邻的第二源列的单元的内容的第二组特征,以及描述源列的单元的内容的第三组特征。 系统创建源列到与实体关联的目标列的映射,并根据映射将映射的源列转换为目标列。

    CLIENT-SERVER HYBRID AI SCORES FOR CUSTOMIZED ACTIONS

    公开(公告)号:US20180198889A1

    公开(公告)日:2018-07-12

    申请号:US15400331

    申请日:2017-01-06

    CPC classification number: G06N3/004 H04L67/02 H04L67/22 H04L67/306

    Abstract: Client-server hybrid A.I. scores for customized actions are described. A client generates client scores corresponding to client customized actions by applying a user-specific model to an action received from a user, the user-specific model based on at least one historical action received from the user. The client requests a server to provide server scores corresponding to server customized actions by applying a cross-user model to the action received from the user, the cross-user model based on historical actions associated with server users. The client generates hybrid scores corresponding to hybrid customized actions by combining the client scores with the server scores, in response to receiving the server scores from the server. The client causes the hybrid customized actions to be outputted based on the corresponding hybrid scores.

    RULE SET INDUCTION
    10.
    发明申请
    RULE SET INDUCTION 审中-公开

    公开(公告)号:US20180157988A1

    公开(公告)日:2018-06-07

    申请号:US15368173

    申请日:2016-12-02

    CPC classification number: G06N5/025

    Abstract: System receives inputs, each input associated with a label and having features, creates a rule for each feature, each rule including a feature and a label, each rule stored in a hierarchy, and distributes each rule into a partition associated with a label or another partition associated with another label. System identifies a number of inputs that include a feature for a rule in the rule partition, and identifies another number of inputs that include both the feature for the rule and another feature for another rule in the rule partition. System deletes the rule from the hierarchy if the ratio of the other number of inputs to the number of inputs satisfies a threshold and an additional number of inputs that includes the other antecedent feature is at least as much as the number. System predicts a label for an input including features by applying each remaining rule to the input.

Patent Agency Ranking