摘要:
A similar document search method includes a step of extracting a characteristic word candidate as a candidate for a characteristic word from a seeds document including desired retrieval contents, a step of extracting as characteristic words of the seeds document, when the characteristic word candidate extracted by the extracting step is a compound characteristic word including a plurality of characteristic words, the compound characteristic word and constituent characteristic words included in the compound characteristic word from the characteristic word candidate, a step of calculating, according to the characteristic words extracted by the extracting step, similarity between the seeds document and a registration document, and a step of outputting as a retrieval result a result of the similarity calculated by the similarity calculating step.
摘要:
In a text mining technique, if the system only extracts characteristic words and phrases frequently cooccurring with the respective components of an analysis axis as an analysis condition, similar words and phrases are extracted for any component. To clearly indicate existence of characteristic words and phrases which do not appear as cooccurrence words and phrases for other components of the analysis axis, it is desired to appropriately present distinguishable features between the components to the user. For this purpose, the frequency of appearances of a plurality of characteristic words and phrases in a document satisfying each analysis condition is calculated. As a result, multiple cooccurrence words and phrases and component-cooccurrence words and phrases are discriminatively displayed. It is therefore possible for the user to appropriately analyze the contents of a plurality of documents.
摘要:
Similar document retrieving method and system for retrieving similar documents from a document database storing plural documents written in different languages with high accuracy while suppressing retrieval noise even when difference is found in the number of registered documents in dependence on the species of description languages. Statistical information concerning the registration-subjected documents is collected on a language-by-language basis upon registration thereof. Upon retrieval of documents similar to a query document, weights of words extracted from the query document are taken into account and on a language-by-language basis by referencing the statistical information.
摘要:
In document retrieval having the relevance feedback function to modify a searching profile for retrieval on the basis of a user's evaluation to evaluate a search result as pertinent or impertinent, recommencement of the relevance feedback returned to a desired time is permitted. An evaluation inputted by a user, a searching profile modified by the evaluation and a search result based on the searching profile are all saved while making the correspondence between them. When a request for restoration of searching profile is made, a searching profile corresponding to an evaluation designated by the user is restored.
摘要:
Word boundary identification operations such as morpheme analysis is performed on documents to be registered, and the top positions and the end positions of words are identified. Word boundary information is obtained based on these identification results. Search indexes are created for sub-strings of a predetermined length (n-grams) extracted from the document being registered. The search index includes document identification information as well as occurrence position information which indicates that the string is located at the n-th position from the beginning of the text data, and word boundary information for an n-gram in a document.
摘要:
A text mining method whereby documents (texts) can be analyzed from a wide variety of visual points. The text mining method includes: distinctive word and/or phrase extraction step of extracting words and/or phrases characteristically emerging in a processing subject document set obtained by taking out whole or a part of a set of documents registered beforehand; definition information setting step of setting definition information including a specified word or phrase or specified bibliography information; coincident word and/or phrase acquisition step of acquiring coincident words and/or phrases coincident in a predetermined range with a word or phrase or bibliography information included in said definition information from among words and/or phrases extracted at said distinctive word and/or phrase extraction step; and multiplex coincident word and/or phrase acquisition step of acquiring coincident words and/or phrases coincident in a predetermined range with an individual word or phrase or bibliography information acquired from each of a plurality of different definition information pieces.
摘要:
Retrieval conditions inputted from a plurality of users are registered. According to the retrieval conditions, a retrieval is conducted for a text inputted. As a result of the retrieval, similarity of the text is calculated for each retrieval condition. The text is delivered to users of which the retrieval condition satisfies the similarity.
摘要:
Retrieval conditions inputted from a plurality of users are registered. According to the retrieval conditions, a retrieval is conducted for a text inputted. As a result of the retrieval, similarity of the text is calculated for each retrieval condition. The text is delivered to users of which the retrieval condition satisfies the similarity.
摘要:
Retrieval conditions inputted from a plurality of users are registered. According to the retrieval conditions, a retrieval is conducted for a text inputted. As a result of the retrieval, similarity of the text is calculated for each retrieval condition. The text is delivered to users of which the retrieval condition satisfies the similarity.
摘要:
A document retrieval method using a computer program includes retrieving a first set of documents using a first query expression generated by the computer program. The first set of documents is provided to a user. An evaluation of the first set of documents is received from the user. The first query expression is changed to a second query expression generated by the computer program based on the evaluation.