摘要:
A method for making document information searches. In performing a document search with respect to the desired key word, two stages of presearch are carried out. In a first stage of presearch, a character component table in which an existence of character codes for every document is stated with respect to all the character codes contained in the group of document text data of stored documents is generated, and the character component table is searched for all the character strings constituting a desiredly designated search subject key word to thereby extract all the documents each containing all the character codes constituting the search subject key word. In a second stage of presearch, contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated is generated, and the documents each containing the search subject key words by word are extracted from the documents extracted by the first presearch. After the second stage of presearch, text search is performed in accordance with a neighbor condition, a contextual condition, or the like.
摘要:
A method and apparatus for performing a document information search to uncover specified text data containing a given search subject key word from a group of document text data stored in a memory. In the document information search method, two stages of presearch are carried out to perform the document search with respect to a desired subject key word. In a first stage of presearch, a character component table is generated in which the existence of character codes for every document is set forth with respect to all the character codes contained in the group of document text data of stored documents. The character component table is searched for all the character codes comprising a designated search subject key word to thereby extract all documents containing all the character codes comprising the search subject key word. Further, in the presearch step, all texts without the possibility of containing the search subject key word are eliminated. A comprehensive, narrowed text search is thereby performed in accordance with the search subject key word.
摘要:
A method and apparatus for making document information search and a magnetic disk unit to be used for realizing the method and apparatus. In the document information search method, in performing document search with respect to a desired subject key word, two stages of presearch are carried out. In a first stage of presearch (step 402), a character component table (500) in which existence of character codes for every document is stated with respect to all the character codes contained in the group of document text data of stored documents is generated, and the character component table is searched for all the character codes constituting a desiredly designated search subject key word to thereby extract all the documents each containing all the character codes constituting the search subject key word. In a second stage of presearch step 403), contracted text data for every document in which adjuncts and duplication of repeatedly stated words contained in advance in the text data are eliminated is generated, and the documents each containing the search subject key words by word are extracted from the documents extracted by the first presearch. After the second stage of presearch, text search is performed in accordance with a neighbor condition, a contextual condition, or the like (step 404). Further, as a term comparator means, hardware (1106) for exclusive use for term comparison in accordance with a finite automation is employed. Further, as for different notation and synonym, inputted terms are once subject to different notation development in a different notation development processing portion (2601), each of the different-notation developed terms is subject to synonym development in a synonym development processing portion (2602) while referring to a synonym dictionary, and then the results of synonym development are further subject to different notation development in a different notation development processing portion (2603) in accordance with a conversion rule table (2603).
摘要:
A compact character string retrieving system capable of producing correctly the result of matching without omission even upon occurrence of multiple matching in which a plurality of search terms are matched for one character string by a finite automation. A destination state for transition brought about by a trailing character of the search term is newly created instead of an initial state. A transition table storage stores the destination state. On the basis of the source state number and a specified pattern character code, the destination state number is read out from the state transition table storage. When the state number read out represents the destination state of the transition brought about by the trailing character of the specified pattern character string, an identifier thereof is outputted. The identifiers of the search terms matched are each represented by one bit information, and a group of corresponding flags is stored in one slot. Multiple matching can be performed without omission. The character string retrieving system is implemented in a reduced size.
摘要:
An image filing apparatus and method for receiving as multivalue data an image corresponding to a document, converting the data to binary image data and storing the binary image data. According to the features of the present invention, pixels of that portion of an input image corresponding to a particular color are extracted, and luminance data expressing monochromatic binary image data and binary image data designating colored portions are stored in different planes. Not only black pixels but also pixels expressed in a particular color such as red are described in the plane for the luminance data. Pixels having a particular color to be expressed in "red" such as red characters are extracted and recorded as "1" in another particular color plane, for example, in R-plane different from the luminance plane. The R-plane is recorded with binary image data; only pixels written in "red" are expressed as "1" there and other pixels as "0". When outputted, the pixels in which the contents of the luminance plane are "1" and the contents of the R-plane are "0" are displayed in black and the pixels in which the contents of the R-plane are "1" are displayed in red.
摘要:
An image processing system wherein for an inputted composite image composed of a line image and a dither image, both a line image processing and a dither image processing are carried out in parallel, and one of the processed results as selected in accordance with the image region discrimination result. The dither image processing is carried out through data conversion for calculating multivalued gray scale image from the inputted image data, gray scale data conversion for adjusting the gray scale image data so as to match an output device and obtaining such adjusted gray scale image data, and re-binarization for re-binarizing the gray scale image data after subjected to the gray scale conversion. The image region discrimination for discriminating if an image region is of a line image of a dither image is carried out based on a ratio of the number of black or white pixels within the region to the contour line length within the range. In ordered dither image through a screened type dither matrix is discriminated in accordance with a correlation between adjacent pixel trains each having a predetermined number of pixels.
摘要:
A file document image input can have shading removed to produce a deshaded image that is useful for highly efficient compression encoding, to be thereafter stored and transmitted in such efficient encoded form. When the image is retrieved or received, the decoded and deshaded image may have shading returned to it by combining with a shaded template image stored in the template image memory or by synthesizing shading in specified regions. By removing shading from regions, optical character recognition, shading identification or other processing such as smear prevention can be enhanced.
摘要:
A partial region of an image is set as being a secret region and an ID code is put to data concerning this secret region, the data then being stored in a memory, so that, unless an ID code which is input from a keyboard is coincident with the stored ID code, the image is selectively prevented from being displayed on a CRT display, printed by a printer, or subjected to additional writing, revision, cutting on the display screen. Thus, it is possible to restrict the visual output or revision of secret portions of images and therefore possible to individually control images which need to be kept secret and those which need not.
摘要:
A color document image processing apparatus comprising an image input means for inputting document image data including multivalue color image, a binarizing means for binarizing input document image data by a simple binarization or artificial binary-halftone process, an image memory means for temporarily storing image data binarized by said binarizing means, a codec means for executing predetermined coding for storing image data stored in the image memory means and executing decoding to the stored document image data, an image storing means for storing document image data encoded by the codec means, a binary-halftone transducing means for transducing binary image data decoded by the codec means into multivalue image data, and an image output means for outputting multivalue image data transduced by the binary-halftone transducing means.
摘要:
An image processing system wherein for an inputted composite image composed of a line image and a dither image, both a line image processing and a dither image processing are carried out in parallel, and one of the processed results is selected in accordance with the image region discrimination result. The dither image processing is carried out through data conversion for calculating multivalued gray scale image from the inputted image data, grey scale data conversion for adjusting the gray scale image data so as to match an output device and obtaining such adjusted gray scale image data, and re-binarization for re-binarizing the gray scale image data after subjected to the grey scale conversion. The image region discrimination for discriminating if an image region is of a line image or a dither image is carried out based on a ratio of the number of black or white pixels within the region to the contour line length within the range. An ordered dither image through a screened type dither matrix is discriminated in accordance with a corelation between adjacent pixel trains each having a predetermined number of pixels.