摘要:
A method for making document information searches. In performing a document search with respect to the desired key word, two stages of presearch are carried out. In a first stage of presearch, a character component table in which an existence of character codes for every document is stated with respect to all the character codes contained in the group of document text data of stored documents is generated, and the character component table is searched for all the character strings constituting a desiredly designated search subject key word to thereby extract all the documents each containing all the character codes constituting the search subject key word. In a second stage of presearch, contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated is generated, and the documents each containing the search subject key words by word are extracted from the documents extracted by the first presearch. After the second stage of presearch, text search is performed in accordance with a neighbor condition, a contextual condition, or the like.
摘要:
A method and apparatus for performing a document information search to uncover specified text data containing a given search subject key word from a group of document text data stored in a memory. In the document information search method, two stages of presearch are carried out to perform the document search with respect to a desired subject key word. In a first stage of presearch, a character component table is generated in which the existence of character codes for every document is set forth with respect to all the character codes contained in the group of document text data of stored documents. The character component table is searched for all the character codes comprising a designated search subject key word to thereby extract all documents containing all the character codes comprising the search subject key word. Further, in the presearch step, all texts without the possibility of containing the search subject key word are eliminated. A comprehensive, narrowed text search is thereby performed in accordance with the search subject key word.
摘要:
A method and apparatus for making document information search and a magnetic disk unit to be used for realizing the method and apparatus. In the document information search method, in performing document search with respect to a desired subject key word, two stages of presearch are carried out. In a first stage of presearch (step 402), a character component table (500) in which existence of character codes for every document is stated with respect to all the character codes contained in the group of document text data of stored documents is generated, and the character component table is searched for all the character codes constituting a desiredly designated search subject key word to thereby extract all the documents each containing all the character codes constituting the search subject key word. In a second stage of presearch step 403), contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated is generated, and the documents each containing the search subject key words by word are extracted from the documents extracted by the first presearch. After the second stage of presearch, text search is performed in accordance with a neighbor condition, a contextual condition, or the like (step 404). Further, as a term comparator means, hardware (1106) for exclusive use for term comparison in accordance with a finite automation is employed. Further, as for different notation and synonym, inputted terms are once subject to different notation development in a different notation development processing portion (2601), each of the different-notation developed terms is subject to synonym development in a synonym development processing portion (2602) while referring to a synonym dictionary, and then the results of synonym development are further subject to different notation development in a different notation development processing portion (2603) in accordance with a conversion rule table (2603).
摘要:
A compact character string retrieving system capable of producing correctly the result of matching without omission even upon occurrence of multiple matching in which a plurality of search terms are matched for one character string by a finite automation. A destination state for transition brought about by a trailing character of the search term is newly created instead of an initial state. A transition table storage stores the destination state. On the basis of the source state number and a specified pattern character code, the destination state number is read out from the state transition table storage. When the state number read out represents the destination state of the transition brought about by the trailing character of the specified pattern character string, an identifier thereof is outputted. The identifiers of the search terms matched are each represented by one bit information, and a group of corresponding flags is stored in one slot. Multiple matching can be performed without omission. The character string retrieving system is implemented in a reduced size.
摘要:
A parallel comparator for performing a parallel and high-speed processing for collation of partial character strings which are partially taken out of a plurality of character strings of interest to be searched out with a character string to be searched in which document data to be searched is arranged sequentially from a leading character, is provided in a front stage of an automaton executing device. Only when a part of the character string to be searched coincides with the partial character string set in the comparator, the collation of the remaining portion of the character string to be searched is performed by the automaton executing device. Also, it is possible to set "don't care" in which a character at any position in the partial character string is ignored at the time of comparison by the comparator and to set a negation condition in which the comparison by the comparator is made taking the negation of a character at any position in the partial character string.
摘要:
A document retrieval method and system for retrieving, from a document database storing document data in the form of character codes, a document which contains given search terms and which meets a given search query condition. From documents loaded from the document database, a document containing terms which match the search terms is searched to generate document identification (ID) information including a document identifier of the searched document and containing match terms found to match with the search terms as well as term identifiers of the match terms and position information of the match terms in the searched document. A decision is then made as to whether or not the position information of the match terms satisfies a positional condition specified in the search query condition concerning a positional relation between the search terms, and match information is then generated indicating satisfaction of the search query condition when the positional condition is satisfied. Through a proximity condition decision, it is ascertained whether the match terms satisfy an inter-term distance condition specified in the search query condition. Through a contextual condition decision, it is determined whether the match terms satisfy a concurrence condition specifying concurrence of the search terms in a same sub-sentence, a same sentence or a same paragraph. Through a logical condition, it is decided whether the match terms satisfy a logical condition between the search terms specified in the search query condition.
摘要:
A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out of a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transistion destination which is determined in association with the configuration of the FSA. The following processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.
摘要:
A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out to a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transition destination which is determined in association with the configuration of the FSA. The failure processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.
摘要:
High-speed full document retrieval method and system capable of providing result of retrieval within practically acceptable short search time. Upon registration of documents in a document database, condensed texts are created by decomposing each of textual character strings of the documents to be registered into fragmental character strings in dependence on character species and by checking mutual inclusion relations existing among the fragmental character strings. A component character table is created in which characters occurring in each of the condensed texts are registered without duplication. The condensed texts and the component character table are registered in the data base together with the texts of the documents to be registered. Upon retrieval of a document containing a search term designated by a user, a component character table search is first executed to extract those documents which contain all species of characters constituting the search term by consulting the component character table, and subsequently a condensed text search is executed by consulting the condensed texts of the documents. Finally, a text body search is executed for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through the component character table search and the condensed text search.
摘要:
An image filing apparatus and method for receiving as multivalue data an image corresponding to a document, converting the data to binary image data and storing the binary image data. According to the features of the present invention, pixels of that portion of an input image corresponding to a particular color are extracted, and luminance data expressing monochromatic binary image data and binary image data designating colored portions are stored in different planes. Not only black pixels but also pixels expressed in a particular color such as red are described in the plane for the luminance data. Pixels having a particular color to be expressed in "red" such as red characters are extracted and recorded as "1" in another particular color plane, for example, in R-plane different from the luminance plane. The R-plane is recorded with binary image data; only pixels written in "red" are expressed as "1" there and other pixels as "0". When outputted, the pixels in which the contents of the luminance plane are "1" and the contents of the R-plane are "0" are displayed in black and the pixels in which the contents of the R-plane are "1" are displayed in red.