摘要:
An information extraction model is trained on format features identified within labeled training documents. Information from a document is extracted by assigning labels to units based on format features of the units within the document. A begin label and end label are identified and the information is extracted between the begin label and the end label. The extracted information can be used in various document processing tasks such as ranking.
摘要:
A typed separable mixture model is used to mine associative relationships between sets of objects. Instead of modeling only one type of co-occurrence among the sets of objects, the typed separable mixture model can model multiple different types of co-occurrences among more than two sets of objects, and co-occurrences that exist in different contexts.
摘要:
The present invention relates to extracting information from an information source. During extraction, strings in the information source are accessed. These strings in the information source are matched with generalized extraction patterns that include words and wildcards. The wildcards denote that at least one word in an individual string can be skipped in order to match the individual string to an individual generalized extraction pattern.
摘要:
A method of finding documents. A method of finding documents comprising, ranking documents according to relevance to form a ranked relevance list, ranking documents according to type to form a ranked type list, and combining the ranked relevance list and the ranked type list to form a list of documents ranked by relevance and type.
摘要:
A query and a factoid type selection are received from a user. An index of passages, indexed based on factoids, is accessed and passages that are related to the query, and that have the selected factoid type, are retrieved. The retrieved passages are ranked and provided to the user based on a calculated score, in rank order.
摘要:
A method of processing information is provided. The method includes collecting text strings of definition candidates from a data source. The definition candidates are ranked based on the text strings.
摘要:
A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.
摘要:
A computer-implemented method is provided that includes receiving a document and determining a file type for the document. In addition, the document is segmented into blocks of text as a function of the file type and at least one keyword and a summary is generated for the document.
摘要:
A two stage model identifies individuals having knowledge in a subject matter area relevant to a query. A relevance model receives a query and identifies documents, or other information, relevant to the query. A co-occurrence model identifies individuals, in the retrieved documents, related to the subject matter of the query. Individuals identified can be scored by combining scores from the relevance model and the co-occurrence model and output in a rank ordered list.
摘要:
A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.