摘要:
A neighboring plural-character occurrence bitmap of a practical capacity capable of eliminating noises by hashing is realized, and a high speed full text search is realized equivalently, by greatly reducing the number of documents to be searched even if a search term constituted by a combination of English characters and words is used. Text data is segmented into words, and n-character strings at every (m+l)-th character positions are extracted from each word. A neighboring plural-character occurrence bitmap is created which stores data representing a presence of each neighboring plural-character string at a certain entry thereof. N-character strings at every (m+l)-th character positions are extracted from a search term and the neighboring plural-character occurrence bitmap is searched by using a search control program. Since the neighboring plural-character occurrence bitmap is searched prior to searching condensed texts, documents not relevant to the search term can be discarded and a high speed full text search can be realized.
摘要:
A text search method for structured documents and apparatus wherein structured document database to be searched is created by adding logical structure length information to logical structure discriminating information on the basis of a structured document constituted by a plurality of logical structures and logical structure discriminator information, and in searching the structured document database in accordance with an entered search key constituted by a logical structure and a search character string, the search of a logical structure other than the entered logical structure is skipped based upon the data length information. In the method and apparatus, by using a structured document, structured document database to be searched is created which is constituted by a condensed text for each logical structure constituted by a list of words contained in the logical structure, a character occurrence bitmap for each logical structure constituted by a list of characters contained in the logical structure and an source structured document, and then the structured document database is searched in accordance with an entered search key constituted by a logical structure and a search character string. An source structured document is searched optionally depending upon the entered search key.
摘要:
High-speed full document retrieval method and system capable of providing result of retrieval within practically acceptable short search time. Upon registration of documents in a document database, condensed texts are created by decomposing each of textual character strings of the documents to be registered into fragmental character strings in dependence on character species and by checking mutual inclusion relations existing among the fragmental character strings. A component character table is created in which characters occurring in each of the condensed texts are registered without duplication. The condensed texts and the component character table are registered in the data base together with the texts of the documents to be registered. Upon retrieval of a document containing a search term designated by a user, a component character table search is first executed to extract those documents which contain all species of characters constituting the search term by consulting the component character table, and subsequently a condensed text search is executed by consulting the condensed texts of the documents. Finally, a text body search is executed for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through the component character table search and the condensed text search.
摘要:
A document search method and system for searching and retrieving a document containing a specific character string in response to search requests issued by a plurality of search request sources. When a search request is received in the course of execution of a search processing for an earlier prior search request, the former is stored in a queue buffer. When a plurality of search requests have been stored in a queue buffer in this manner, a search processing is performed for the plurality of search requests simultaneously as stored. The results of search processing as performed are then distributively outputted to the relevant search request sources, respectively. Output buffers for storing a set of search results of the search processings performed in the past may be provided in correspondence to the search request sources, respectively, for screening the documents for which the character string search is to be performed.
摘要:
An information search terminal apparatus and information search system for performing information search by using a variety of windows assure high manipulatability for the user by making available information of the results of searches performed in the past and the current system state. The information search terminal and system includes a query statement input window for inputting a search query statement for a search term, a search history display window for displaying the search query statement and the number of documents as hit in the search, a search result list display window for displaying in juxtaposition a plurality of titles of documents as hit in the form of a list, and a document display window for displaying a document containing the search term and resulting from the search
摘要:
A typical structure of a file server system is a file server system having a plurality of file servers connected in parallel on a network and sharing files placed distributedly in the file servers among a plurality of client computers, and there are provided in a specific file server among the plurality of file servers, a load information monitoring device for measuring respective loads of the plurality of file servers and a file access request distributing device for referring to the loads measured by the load information monitoring device so as to select a file server having a light load from the plurality of file servers having light loads, and distributing a file access request transmitted from client computers to the selected file server.
摘要:
Character string retrieval method and system for deciding en bloc whether or not a plurality of search terms as designated exist in a text composed of characters expressed in the form of character codes is characterized by inclusion of a character string storage unit for storing a text, a filtering unit for fetching character codes from a text read out from the character string storage unit to thereby output only those character codes that are included in the search term, and a character string matching unit for matching en bloc to decide whether or not the aforementioned search term exists in the string of character codes outputted from the filtering unit.
摘要:
A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out of a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transistion destination which is determined in association with the configuration of the FSA. The following processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.
摘要:
A method for making document information searches. In performing a document search with respect to the desired key word, two stages of presearch are carried out. In a first stage of presearch, a character component table in which an existence of character codes for every document is stated with respect to all the character codes contained in the group of document text data of stored documents is generated, and the character component table is searched for all the character strings constituting a desiredly designated search subject key word to thereby extract all the documents each containing all the character codes constituting the search subject key word. In a second stage of presearch, contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated is generated, and the documents each containing the search subject key words by word are extracted from the documents extracted by the first presearch. After the second stage of presearch, text search is performed in accordance with a neighbor condition, a contextual condition, or the like.
摘要:
A parallel comparator for performing a parallel and high-speed processing for collation of partial character strings which are partially taken out of a plurality of character strings of interest to be searched out with a character string to be searched in which document data to be searched is arranged sequentially from a leading character, is provided in a front stage of an automaton executing device. Only when a part of the character string to be searched coincides with the partial character string set in the comparator, the collation of the remaining portion of the character string to be searched is performed by the automaton executing device. Also, it is possible to set "don't care" in which a character at any position in the partial character string is ignored at the time of comparison by the comparator and to set a negation condition in which the comparison by the comparator is made taking the negation of a character at any position in the partial character string.