摘要:
A method for refining a dictionary for information extraction, the operations including: inputting a set of extracted results from execution of an extractor comprising the dictionary on a collection of text, wherein the extracted results are labeled as correct results or incorrect results; processing the extracted results using an algorithm configured to set a score of the extractor above a score threshold, wherein the score threshold balances a precision and a recall of the extractor; and outputting a set of candidate dictionary entries corresponding to a full set of dictionary entries, wherein the candidate dictionary entries are candidates to be removed from the dictionary based on the extracted results.
摘要:
Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components.
摘要:
Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed.
摘要:
The present invention relates to a methodology to translate exact interpretations of keyword queries into meaningful and grammatically correct plain-language queries in order to convey the meaning of these interpretations to the initiator of the search. The method includes the steps of generating at least one grammatically valid plain-language sentence interpretation for a keyword query from a generated sentence plain-language sentence clauses, wherein the grammatically valid plain-language sentence is based upon differing matching elements, and presenting at least one grammatically valid plain-language sentence interpretation for the keyword query to a keyword query system user for the user's review.
摘要:
A text annotation structured storage system stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The system exploits type information in a type system to develop corresponding schemas in a structured data model. The system comprises techniques for mapping annotations to an XML data model and a relational data model. The system captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.
摘要:
A text annotation structured storage method stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The method exploits type information in a type system to develop corresponding schemas in a structured data model. The method comprises techniques for mapping annotations to an XML data model and a relational data model. The method captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.
摘要:
A text annotation structured storage system stores text annotations with associated type information in a structured data store. The present system persists or stores annotations in a structured data store in an indexable and queryable format. Exemplary structured data stores comprise XML databases and relational databases. The system exploits type information in a type system to develop corresponding schemas in a structured data model. The system comprises techniques for mapping annotations to an XML data model and a relational data model. The system captures various features of the type system, such as complex types and inheritance, in the schema for the persistent store. In particular, the repository provides support for path navigation over the hierarchical type system starting at any type.
摘要:
A query interpretation system exploits semantic annotations in keyword queries over a collection of text documents, casting semantic annotations produced by text analysis engines into a formal annotation type system. The system uses the annotation type system to enumerate various interpretations of a keyword query and automatically translate a keyword query into a set of interpretations expressed in some intermediate query language. The system returns a result list of documents by combining the results of executing one or more of these interpretations. Even though the system generates and uses a complex type system, a user is able to use simple keyword queries to locate documents.
摘要:
The present invention relates to a methodology to translate exact interpretations of keyword queries into meaningful and grammatically correct plain-language queries in order to convey the meaning of these interpretations to the initiator of the search. The method includes the steps of generating at least one grammatically valid plain-language sentence interpretation for a keyword query form a generated sentence is based upon differing matching elements, and presenting at least one grammatically valid plain-language sentence interpretation for the keyword query to a keyword query system user for the user's review.
摘要:
Disclosed are a system, method, and program storage device implementing the method of extracting information, wherein the method comprises inputting a query; searching a database of documents based on the query; retrieving documents from the database matching the query using a plurality of classifiers arranged in a hierarchical cascade of classifier layers, wherein each classifier comprises a set of weighted training data points comprising feature vectors representing any portion of a document; and weighing an output from the cascade according to a rate of success of query terms being matched by each layer of the cascade, wherein the weighing is performed using a terminal classifier.