-
公开(公告)号:US11531717B2
公开(公告)日:2022-12-20
申请号:US16794895
申请日:2020-02-19
IPC分类号: G06F16/9535 , G06F16/25 , G06F16/27 , G06F16/2457
摘要: Data records are linked across a plurality of datasets. Each dataset contains at least one data record, and each data record is associated with an entity and includes one or more attributes of that entity and a value for each attribute. Values associated with attributes are compared across datasets, and matching attributes having values that satisfy a predetermined similarity threshold are identified. In addition, linkage points between pairs of datasets are identified. Each linkage point links one or more pairs of data records. Each data record in the pair of data records is contained in one of a given pair of datasets, and each pair of data records is associated with a common entity having matching attributes in the given pair of datasets. Data records associated with the common entities are linked across datasets using the identified linkage points.
-
公开(公告)号:US11227002B2
公开(公告)日:2022-01-18
申请号:US14954664
申请日:2015-11-30
IPC分类号: G06F16/35 , G06F16/215
摘要: An apparatus and method of identifying semantically related records, including receiving input data from an input device, splitting the input data into a plurality of clusters according to semantic relationship, each of the clusters including a plurality of source terms and a plurality of target terms, transforming each of the plurality of clusters based on the transformation which includes tokenization of the plurality of clusters, for each of the plurality of clusters that are transformed, finding relatedness scores of a plurality of semantic relatedness measures with the plurality of target terms, building a vector of similarity scores for each of the plurality of target terms, and for each of the plurality of source terms, selecting a predetermined number of the plurality of target terms according to the similarity scores.
-
公开(公告)号:US11074266B2
公开(公告)日:2021-07-27
申请号:US16157304
申请日:2018-10-11
IPC分类号: G06F17/27 , G06F16/2458 , G06K9/46 , G06F16/2452 , G06F16/2457 , G06F40/30 , G06F40/279
摘要: A concept discovery method, system, and computer program product include preparing a concept index for concepts built over a set of input data having input terms, building a vector representation of the concepts in the input data, receiving a set of query terms as an additional input, mapping the set of query terms to the concepts in the concept index, calculating at least one of a co-occurrence score for each of the concepts in the concept index by measuring their frequency of co-occurrence with the input terms' concepts and a similarity score for each of the concepts in the concept index by measuring the similarity of their vector representations according to a vector similarity measure, and ranking the concepts with respect to their relevance to the input terms by the at least one of the co-occurrence score and the similarity score.
-
公开(公告)号:US10740304B2
公开(公告)日:2020-08-11
申请号:US14467640
申请日:2014-08-25
发明人: Achille Belly Fokoue-Nkoutche , Oktie Hassanzadeh , Anastasios Kementsietsidis , Kavitha Srinivas , Michael J. Ward
摘要: Various embodiments virtualize data across heterogeneous formats. In one embodiment, a plurality of heterogeneous data sources is received as input. A local schema graph including a set of attribute nodes and a set of type nodes is generated for each of the plurality of heterogeneous data sources. A global schema graph is generated based on each local schema graph that has been generated. The global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs. The edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes.
-
公开(公告)号:US20200234102A1
公开(公告)日:2020-07-23
申请号:US16840846
申请日:2020-04-06
发明人: Nicolas R. Fauceglia , Alfio M. Gliozzo , Oktie Hassanzadeh , Thien H. Nguyen , Mariano Rodriguez Muro , Mohammad Sadoghi Hamedani
IPC分类号: G06N3/04
摘要: A system, method and computer program product for disambiguating one or more entity mentions in one or more documents. The method facilitates the simultaneous linking entity mentions in a document based on convolution neural networks and recurrent neural networks that model both the local and global features for entity linking. The framework uses the capacity of convolution neural networks to induce the underlying representations for local contexts and the advantage of recurrent neural networks to adaptively compress variable length sequences of predictions for global constraints. The RNN functions to accumulate information about the previous entity mentions and/or target entities, and provide them as the global constraints for the linking process of a current entity mention.
-
公开(公告)号:US20200097861A1
公开(公告)日:2020-03-26
申请号:US16141303
申请日:2018-09-25
发明人: Nandana Sampath Mihindukulasooriya , Oktie Hassanzadeh , Alfio Massimiliano Gliozzo , Sarthak Dash
摘要: Techniques regarding autonomous classification and/or identification of various types of noise comprised within a knowledge graph are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a knowledge extraction component, operatively coupled to the processor, that can classify a type of noise comprised within a knowledge graph. The type of noise can be generated by an information extraction process.
-
17.
公开(公告)号:US20190258675A1
公开(公告)日:2019-08-22
申请号:US16399535
申请日:2019-04-30
IPC分类号: G06F16/901 , G06F16/36
摘要: A method, system, and recording medium for knowledge graph augmentation using data based on a statistical analysis of attributes in the data, including a ranking device configured to rank semantically similar input data elements to create a ranked list of attributes to augment an input of structured data and populate with a data string corresponding to the instances, where the ranking device further combines a set of filters to refine the ranked list of attributes, the set of filters including a first filter according to column ranges of columns, a second filter according to a column uniqueness of the columns, a third filter according to a type of data in a column of the columns, and a fourth filter according to a distribution of values in the columns.
-
公开(公告)号:US11755885B2
公开(公告)日:2023-09-12
申请号:US16840846
申请日:2020-04-06
发明人: Nicolas R. Fauceglia , Alfio M. Gliozzo , Oktie Hassanzadeh , Thien H. Nguyen , Mariano Rodriguez Muro , Mohammad Sadoghi Hamedani
摘要: A system, method and computer program product for disambiguating one or more entity mentions in one or more documents. The method facilitates the simultaneous linking entity mentions in a document based on convolution neural networks and recurrent neural networks that model both the local and global features for entity linking. The framework uses the capacity of convolution neural networks to induce the underlying representations for local contexts and the advantage of recurrent neural networks to adaptively compress variable length sequences of predictions for global constraints. The RNN functions to accumulate information about the previous entity mentions and/or target entities, and provide them as the global constraints for the linking process of a current entity mention.
-
公开(公告)号:US11599826B2
公开(公告)日:2023-03-07
申请号:US16741084
申请日:2020-01-13
发明人: Udayan Khurana , Sainyam Galhotra , Oktie Hassanzadeh , Kavitha Srinivas , Horst Cornelius Samulowitz
摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.
-
公开(公告)号:US10795937B2
公开(公告)日:2020-10-06
申请号:US15230932
申请日:2016-08-08
IPC分类号: G06F16/901 , G06N20/00 , G06N5/02
摘要: Methods, systems, and computer program products for expressive temporal predictions over semantically-driven time windows are provided herein. A computer-implemented method includes identifying, within a knowledge graph pertaining to a given prediction, a subset of the knowledge graph related to one or more predicted training examples, wherein the subset comprises (i) a set of nodes and (ii) one or more relationships among the set of nodes; determining, for the identified subset, one or more snapshots of the knowledge graph relevant to the given prediction; quantifying a validity window for the one or more predicted training examples, wherein the validity window comprises a temporal bound for prediction validity; and computing a validity window for the given prediction based on the quantified validity window for the one or more predicted training examples.
-
-
-
-
-
-
-
-
-