摘要:
Information that individual elements (characteristic character strings) indicative of characteristics of a registered document appear in the registered document is stored in advance. When calculating similarity of the registered document, a query designated by a searcher is analyzed. The query is represented by a characteristic vector having the individual elements which take the relation between a plurality of words into consideration. Pieces of appearance information of the individual words contained in the query are counted. The counted appearance information is compared with a searching index to calculate similarity between documents.
摘要:
Retrieval conditions inputted from a plurality of users are registered. According to the retrieval conditions, a retrieval is conducted for a text inputted. As a result of the retrieval, similarity of the text is calculated for each retrieval condition. The text is delivered to users of which the retrieval condition satisfies the similarity.
摘要:
A document retrieval method using a computer program includes retrieving a first set of documents using a first query expression generated by the computer program. The first set of documents is provided to a user. An evaluation of the first set of documents is received from the user. The first query expression is changed to a second query expression generated by the computer program based on the evaluation.
摘要:
A registration/search method for structured documents where correspondence data is prepared between a fixed-length-string and a string occurrence position within a structured document for all fixed-length-strings in the document and for each structured document. A list of a character and all hierarchical elements containing the character and element lengths is prepared. An occurrence frequency and an occurrence position of a search term is obtained using the plurality of fixed-length-substrings and the occurrence frequency extracting index. A search character is selected from the search term. A hierarchical element containing the search character is obtained using the character from the element length index. A length of the element corresponding to a search range is extracted using the obtained occurrence position. A matching degree for the search term is calculated from the obtained occurrence frequency of the search term and the extracted element length of the element corresponding to the search range.
摘要:
In a text mining technique, if the system only extracts characteristic words and phrases frequently cooccurring with the respective components of an analysis axis as an analysis condition, similar words and phrases are extracted for any component. To clearly indicate existence of characteristic words and phrases which do not appear as cooccurrence words and phrases for other components of the analysis axis, it is desired to appropriately present distinguishable features between the components to the user. For this purpose, the frequency of appearances of a plurality of characteristic words and phrases in a document satisfying each analysis condition is calculated. As a result, multiple cooccurrence words and phrases and component-cooccurrence words and phrases are discriminatively displayed. It is therefore possible for the user to appropriately analyze the contents of a plurality of documents.
摘要:
Retrieval conditions inputted from a plurality of users are registered. According to the retrieval conditions, a retrieval is conducted for a text inputted. As a result of the retrieval, similarity of the text is calculated for each retrieval condition. The text is delivered to users of which the retrieval condition satisfies the similarity.
摘要:
A document retrieval system is provided which has a document display interface which is easy to recognize the important portions even if a document retrieved by using a query expression designated by a document or a long sentence is displayed. When a text is registered, predetermined character strings and location information which are extracted from the text are stored in a location information file. A weight of each character string is calculated by a predetermined method and is stored in a weight file. In retrieving a document, predetermined character strings are extracted from a designated query expression. A similarity is calculated between the query expression and texts in the database by using the location information and the weights acquired from the location file and the weight file. In displaying the document, character strings having the high weights are extracted from the character strings used for the retrieval. Then, the display format of a portion which contains the extracted character strings is changed to display the text.
摘要:
A registration method for structured documents includes the steps of: preparing correspondence data between a string and a string occurrence position within a structured document for each structured document, and additionally storing the correspondence data in an occurrence frequency extracting index; and preparing a list of a character, an element containing the character and a length of the element and additionally storing the list in an element length index. A search method for structured documents includes the steps of: inputting search conditions including a search term and an element for specifying a search range; decomposing the search term into a plurality of substrings, obtaining an occurrence frequency and an occurrence position of the search term using the plurality of substrings from the occurrence frequency extracting index; selecting a character from the search term, obtaining an element containing the character using the character from the element length index, and further extracting a length of the element within the search range; calculating a matching degree for the search conditions from the occurrence frequency and the occurrence position of the search term and the length of the element within the search range; and outputting the element containing the search term and the matching degree.
摘要:
In document retrieval having the relevance feedback function to modify a searching profile for retrieval on the basis of a user's evaluation to evaluate a search result as pertinent or impertinent, recommencement of the relevance feedback returned to a desired time is permitted. An evaluation inputted by a user, a searching profile modified by the evaluation and a search result based on the searching profile are all saved while making the correspondence between them. When a request for restoration of searching profile is made, a searching profile corresponding to an evaluation designated by the user is restored.
摘要:
Word boundary identification operations such as morpheme analysis is performed on documents to be registered, and the top positions and the end positions of words are identified. Word boundary information is obtained based on these identification results. Search indexes are created for sub-strings of a predetermined length (n-grams) extracted from the document being registered. The search index includes document identification information as well as occurrence position information which indicates that the string is located at the n-th position from the beginning of the text data, and word boundary information for an n-gram in a document.