摘要:
A registration method for structured documents includes the steps of: preparing correspondence data between a string and a string occurrence position within a structured document for each structured document, and additionally storing the correspondence data in an occurrence frequency extracting index; and preparing a list of a character, an element containing the character and a length of the element and additionally storing the list in an element length index. A search method for structured documents includes the steps of: inputting search conditions including a search term and an element for specifying a search range; decomposing the search term into a plurality of substrings, obtaining an occurrence frequency and an occurrence position of the search term using the plurality of substrings from the occurrence frequency extracting index; selecting a character from the search term, obtaining an element containing the character using the character from the element length index, and further extracting a length of the element within the search range; calculating a matching degree for the search conditions from the occurrence frequency and the occurrence position of the search term and the length of the element within the search range; and outputting the element containing the search term and the matching degree.
摘要:
A method for extracting features in contents of a document without using a word dictionary and a system using the method for accurately searching for a relevant document or documents at high speed. The method includes steps of storing character strings present in a text in a text database and possibilities appearing at boundaries of words in the text in the form of an occurrence probability file, storing occurrence frequencies of the character strings in the text as an occurrence frequency file, extracting characteristic strings from a text spcified by a user with use of the occurrence probability file, and counting occurrence frequencies thereof in the user-specified text. The method calculates similarities to the user-specified text with use of the occurrence frequency file and the occurrence frequencies in the user-specified text.
摘要:
A registration/search method for structured documents where correspondence data is prepared between a fixed-length-string and a string occurrence position within a structured document for all fixed-length-strings in the document and for each structured document. A list of a character and all hierarchical elements containing the character and element lengths is prepared. An occurrence frequency and an occurrence position of a search term is obtained using the plurality of fixed-length-substrings and the occurrence frequency extracting index. A search character is selected from the search term. A hierarchical element containing the search character is obtained using the character from the element length index. A length of the element corresponding to a search range is extracted using the obtained occurrence position. A matching degree for the search term is calculated from the obtained occurrence frequency of the search term and the extracted element length of the element corresponding to the search range.
摘要:
Similar document retrieving method and system for retrieving similar documents from a document database storing plural documents written in different languages with high accuracy while suppressing retrieval noise even when difference is found in the number of registered documents in dependence on the species of description languages. Statistical information concerning the registration-subjected documents is collected on a language-by-language basis upon registration thereof. Upon retrieval of documents similar to a query document, weights of words extracted from the query document are taken into account and on a language-by-language basis by referencing the statistical information.
摘要:
A similar document search method includes a step of extracting a characteristic word candidate as a candidate for a characteristic word from a seeds document including desired retrieval contents, a step of extracting as characteristic words of the seeds document, when the characteristic word candidate extracted by the extracting step is a compound characteristic word including a plurality of characteristic words, the compound characteristic word and constituent characteristic words included in the compound characteristic word from the characteristic word candidate, a step of calculating, according to the characteristic words extracted by the extracting step, similarity between the seeds document and a registration document, and a step of outputting as a retrieval result a result of the similarity calculated by the similarity calculating step.
摘要:
A computerized document management system manages and allows viewing of attachment documents in groups of electronic mail messages. A determination is first made as to whether an electronic mail message is a task message. If so, task history information, including the main text of the electronic mail message, attribute information, and information about relations with other messages, is stored. The attachment documents are then extracted and stored together with attachment document management information. Upon receipt of a search request, a list of attachment documents and task histories can then be displayed.
摘要:
A document search method and apparatus and a portable medium used therefor are described, in which when registering a document in a data base, the logic structures of each document to be registered are superposed one on another to generate a structure index in which the structure elements having the same position of occurrence in the document are represented by a single meta-node. At the time of document search, a mass of the meta-nodes meeting a specified structural condition is determined with reference to the structure index. A string index is searched with the meta-node identifiers as a key thereby to determine a mass of documents meeting the specified condition. As a result, a highly accurate structure-specified search is made possible on a document data base including a mass of structured documents. In the structure-specified search of structured documents, the conditions for the position of occurrence of the logic elements in the document are specified, thereby making possible a highly accurate structure-specified search.
摘要:
A document searching system searches for other documents having a user-specified document cited therein as its referred document to thereby uncover the latest document associated with the user-specified document. In related document searching method, document information is registered in a text storage region, a referred document table and a related document table are created, and referred documents associated with the user-specified document are searched for with use of the created tables.
摘要:
A typical structure of a file server system is a file server system having a plurality of file servers connected in parallel on a network and sharing files placed distributedly in the file servers among a plurality of client computers, and there are provided in a specific file server among the plurality of file servers, a load information monitoring device for measuring respective loads of the plurality of file servers and a file access request distributing device for referring to the loads measured by the load information monitoring device so as to select a file server having a light load from the plurality of file servers having light loads, and distributing a file access request transmitted from client computers to the selected file server.
摘要:
In a document information processing apparatus handling a large amount of documents, when a document is registered, retrieval data for document retrieval is created for each registered document. Moreover, for each registered document, there is produced an access control table in which information indicating accessibility of groups including users as document retrievers for the document is registered. When a user desires to retrieve a document, accessibility of the user as the retriever is determined in accordance with the access control table for documents retrieved with retrieval data.