摘要:
Techniques for computing a solution to a query formulated against a knowledge base (KB) are provided. The techniques include receiving a query formulated against a knowledge base, wherein the knowledge base comprises a set of one or more axioms, wherein each axiom is annotated with a specific probability value indicating a degree of certainty assigned thereto, ignoring each probability value of the one or more axioms and computing a solution to the query, computing each of one or more justifications for the query solution, wherein computing each of one or more justifications for the query solution comprises determining a minimal set of one or more axioms in the knowledge base that entail the query solution, and using each probability value of the one or more axioms in each justification to compute a net probability of an inferred query solution.
摘要:
Techniques for computing a solution to a query formulated against a knowledge base (KB) are provided. The techniques include receiving a query formulated against a knowledge base, wherein the knowledge base comprises a set of one or more axioms, wherein each axiom is annotated with a specific probability value indicating a degree of certainty assigned thereto, ignoring each probability value of the one or more axioms and computing a solution to the query, computing each of one or more justifications for the query solution, wherein computing each of one or more justifications for the query solution comprises determining a minimal set of one or more axioms in the knowledge base that entail the query solution, and using each probability value of the one or more axioms in each justification to compute a net probability of an inferred query solution.
摘要:
Aspects of the present invention provide a tool for extracting schema from a spreadsheet. In an embodiment, a set of data that is stored in an uncataloged tabular format, such as a spreadsheet, is retrieved. The structure of the retrieved set of data is surveyed to determine the dataset schema thereof. Then, data elements within the dataset schema are analyzed to obtain information regarding the data elements. Based on dataset schema and the element information, an interface can be constructed that allows remote access to the set of data.
摘要:
Methods, apparatus and systems, including computer program products, for reducing an error rate when mapping entities between a first ontology and a second ontology. One or more of a general language dictionary and an industry-specific dictionary are provided. Natural language processing of the first ontology is performed to identify one or more candidate relationship entities in the first ontology. Each candidate relationship entity includes a compound name having two or more semantic labels, and each candidate relationship entity has a name that neither exists in the general language dictionary or the industry-specific dictionary. Each of the one or more candidate relationship entities in the first ontology is mapped to one or more entities in the second ontology using one or more configurable computer-implemented mapping algorithms.
摘要:
Methods, apparatus and systems, including computer program products, for reducing an error rate when mapping entities between a first ontology and a second ontology. One or more of a general language dictionary and an industry-specific dictionary are provided. Natural language processing of the first ontology is performed to identify one or more candidate relationship entities in the first ontology. Each candidate relationship entity includes a compound name having two or more semantic labels, and each candidate relationship entity has a name that neither exists in the general language dictionary or the industry-specific dictionary. Each of the one or more candidate relationship entities in the first ontology is mapped to one or more entities in the second ontology using one or more configurable computer-implemented mapping algorithms.
摘要:
Methods and systems for determining schema element types are shown that include pooling potential annotations for an element of an unlabeled schema from a plurality of heterogeneous sources, scoring the pool of potential annotations according to relevancy using information using instance information from the plurality of heterogeneous sources to produce a relevancy score, and annotating the element of the unlabeled schema using the most relevant potential annotations.
摘要:
Aspects of the present invention provide a tool for extracting schema from a spreadsheet. In an embodiment, a set of data that is stored in an uncataloged tabular format, such as a spreadsheet, is retrieved. The structure of the retrieved set of data is surveyed to determine the dataset schema thereof. Then, data elements within the dataset schema are analyzed to obtain information regarding the data elements. Based on dataset schema and the element information, an interface can be constructed that allows remote access to the set of data.
摘要:
Methods and systems for determining schema element types are shown that include pooling potential annotations for an element of an unlabeled schema from a plurality of heterogeneous sources, scoring the pool of potential annotations according to relevancy using information using instance information from the plurality of heterogeneous sources to produce a relevancy score, and annotating the element of the unlabeled schema using the most relevant potential annotations.
摘要:
A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity in-formation comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data.
摘要:
A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data.