-
公开(公告)号:US20220398241A1
公开(公告)日:2022-12-15
申请号:US17342895
申请日:2021-06-09
发明人: Abhishek Seth , Soma Shekar Naganna , James Albert O'Neill, JR. , Geetha Sravanthi Pulipaty , Neeraj Ramkrishna Singh
摘要: A method for receiving an additional dataset including a plurality of additional data records; determining a record type using classifiers and an internal domain knowledge corpus; dividing the plurality of additional data records into a plurality of indexing groups; assigning the given additional data record to a match set based on completeness and similarity of natures of attributes of the given additional data record; and assigning the given additional data record to and a comparison group based on completeness and similarity of natures of attributes of the given additional data record.
-
公开(公告)号:US20230029643A1
公开(公告)日:2023-02-02
申请号:US17443341
申请日:2021-07-26
发明人: Neeraj Ramkrishna Singh , James Albert O'Neill, JR. , Soma Shekar Naganna , Geetha Sravanthi Pulipaty , Abhishek Seth
IPC分类号: G06F16/22
摘要: A method, a structure, and a computer system for mapping data fields. The exemplary embodiments may include, based on determining that a first data set and a second data set contain homogenous data, mapping at least one column of the first data set to at least one column of the second data set based on comparing at least one of relative column position and unique value sets. Based on determining that the first data set and the second data set contain heterogeneous data, the exemplary embodiments may include mapping the at least one column of the first data set to the at least one column of the second data set based on a difference between distribution signatures of unique value sets within each of the first data set and the second data set being less than a threshold.
-
公开(公告)号:US20210110295A1
公开(公告)日:2021-04-15
申请号:US16599427
申请日:2019-10-11
发明人: Martin Oberhofer , Sergio Luis Olvera Gutierrez , Soma Shekar Naganna , Abhishek Seth , James Albert O'Neill, JR.
摘要: A method for relating different types of records. The method may include providing comparison functions, wherein each comparison function corresponds to a semantical class, and wherein a computational cost is associated with each comparison function. The method may include determining one or more attribute pairs between the different types of records. The method may include sorting the comparison functions according to a determined accuracy. The method may include selecting a set of comparison functions associated with semantical classes according to a predefined rule. The method may include determining a total computational cost based on the computational cost of the selected set of comparison functions. The method may include determining whether two or more records are related using the selected set of comparison functions. The method may include relating the two or more records. The method may include determining a rate of false negative records.
-
公开(公告)号:US20220164396A1
公开(公告)日:2022-05-26
申请号:US17105425
申请日:2020-11-25
IPC分类号: G06F16/903 , G06F16/908 , G06F16/901
摘要: A method, apparatus, computer system, and computer program product for managing information. A set of bucket hashes and comparison information for a data record are identified by a computer system. The set of bucket hashes is generated from the comparison information, wherein the set of bucket hashes and the comparison information form a metadata record. A number of candidate metadata records in a metadata database is identified by the computer system using the set of bucket hashes, wherein the number of candidate metadata records comprises a set of candidate bucket hashes and candidate comparison information. An entity membership is identified by the computer system for the data record from a comparison of the comparison information in the metadata record with the candidate comparison information in the number of candidate metadata records.
-
公开(公告)号:US20220350782A1
公开(公告)日:2022-11-03
申请号:US17246767
申请日:2021-05-03
发明人: Soma Shekar Naganna , James Albert O'Neill, JR. , Geetha Sravanthi Pulipaty , Abhishek Seth , Neeraj Ramkrishna Singh
摘要: Configuring a data management system by receiving user interaction data associated with search results associated with a first system configuration, identifying a usage pattern in the user interaction data using a first machine learning model, and altering the first system configuration according to the usage pattern.
-
公开(公告)号:US20210319026A1
公开(公告)日:2021-10-14
申请号:US16842813
申请日:2020-04-08
发明人: Abhishek Seth , Soma Shekar Naganna , James Albert O'Neill, JR. , Lars Bremer , Mariya Chkalova
IPC分类号: G06F16/2455 , G06N5/04 , G06N20/00
摘要: Matching records in an entity resolution system by defining entity attribute feature vectors, determining an entity attribute matching score according to a distance between two entity attribute feature vectors, assigning a statistical weight to an entity attribute matching score, adjusting the entity attribute matching score according to the statistical weight and an entity attribute frequency of occurrence, and determining an aggregate entity attribute matching score.
-
-
-
-
-