Character string retrieving system and method
    4.
    发明授权
    Character string retrieving system and method 失效
    字符串检索系统和方法

    公开(公告)号:US5140644A

    公开(公告)日:1992-08-18

    申请号:US733982

    申请日:1991-07-22

    IPC分类号: G06F17/30 G06K9/62 G06K9/72

    摘要: A compact character string retrieving system capable of producing correctly the result of matching without omission even upon occurrence of multiple matching in which a plurality of search terms are matched for one character string by a finite automation. A destination state for transition brought about by a trailing character of the search term is newly created instead of an initial state. A transition table storage stores the destination state. On the basis of the source state number and a specified pattern character code, the destination state number is read out from the state transition table storage. When the state number read out represents the destination state of the transition brought about by the trailing character of the specified pattern character string, an identifier thereof is outputted. The identifiers of the search terms matched are each represented by one bit information, and a group of corresponding flags is stored in one slot. Multiple matching can be performed without omission. The character string retrieving system is implemented in a reduced size.

    摘要翻译: 一种紧凑的字符串检索系统,即使在通过有限自动化将多个搜索词与一个字符串相匹配的多个匹配发生时,也能够正确地产生匹配结果而不漏出。 新创建由搜索项的尾随字符引起的转换的目的地状态,而不是初始状态。 转换表存储存储目的地状态。 根据源状态号和指定的码字符代码,从状态转换表存储中读出目的地状态号。 当读出状态号表示由指定模式字符串的尾部字符引起的转换的目的地状态时,输出其标识符。 匹配的搜索项的标识符各自由一个比特信息表示,并且一组对应的标志被存储在一个时隙中。 可以进行多次匹配而不遗漏。 字符串检索系统以减小的尺寸实现。

    System for plural-string search with a parallel collation of a first
partition of each string followed by finite automata matching of second
partitions
    5.
    发明授权
    System for plural-string search with a parallel collation of a first partition of each string followed by finite automata matching of second partitions 失效
    用于多字符串搜索的系统,其中每个字符串的第一分区的并行排序以及第二分区的有限自动机匹配

    公开(公告)号:US5452451A

    公开(公告)日:1995-09-19

    申请号:US349124

    申请日:1994-12-01

    IPC分类号: G06F17/30

    摘要: A parallel comparator for performing a parallel and high-speed processing for collation of partial character strings which are partially taken out of a plurality of character strings of interest to be searched out with a character string to be searched in which document data to be searched is arranged sequentially from a leading character, is provided in a front stage of an automaton executing device. Only when a part of the character string to be searched coincides with the partial character string set in the comparator, the collation of the remaining portion of the character string to be searched is performed by the automaton executing device. Also, it is possible to set "don't care" in which a character at any position in the partial character string is ignored at the time of comparison by the comparator and to set a negation condition in which the comparison by the comparator is made taking the negation of a character at any position in the partial character string.

    摘要翻译: 一种并行比较器,用于执行并行和高速处理,用于将要搜索的多个感兴趣的字符串部分地从要搜索的文档数据中搜索的字符串中部分取出的部分字符串对准, 设置在自动机执行装置的前级中,从主角排列顺序排列。 只有当要搜索的字符串的一部分与比较器中设置的部分字符串一致时,由自动机执行装置执行要搜索的字符串的剩余部分的核对。 此外,可以设置在比较器比较时忽略部分字符串中的任何位置的字符的“无关心”,并且设置比较器进行比较的否定条件 在部分字符串中的任何位置取一个字符。

    Document retrieval method and system
    6.
    发明授权
    Document retrieval method and system 失效
    文件检索方法和系统

    公开(公告)号:US5757983A

    公开(公告)日:1998-05-26

    申请号:US517722

    申请日:1995-08-21

    摘要: A document retrieval method and system for retrieving, from a document database storing document data in the form of character codes, a document which contains given search terms and which meets a given search query condition. From documents loaded from the document database, a document containing terms which match the search terms is searched to generate document identification (ID) information including a document identifier of the searched document and containing match terms found to match with the search terms as well as term identifiers of the match terms and position information of the match terms in the searched document. A decision is then made as to whether or not the position information of the match terms satisfies a positional condition specified in the search query condition concerning a positional relation between the search terms, and match information is then generated indicating satisfaction of the search query condition when the positional condition is satisfied. Through a proximity condition decision, it is ascertained whether the match terms satisfy an inter-term distance condition specified in the search query condition. Through a contextual condition decision, it is determined whether the match terms satisfy a concurrence condition specifying concurrence of the search terms in a same sub-sentence, a same sentence or a same paragraph. Through a logical condition, it is decided whether the match terms satisfy a logical condition between the search terms specified in the search query condition.

    摘要翻译: 一种文档检索方法和系统,用于从存储文字数据形式的文档数据的文档数据库中检索包含给定搜索词并且满足给定搜索查询条件的文档。 从文档数据库中加载的文档中,搜索包含与搜索词匹配的术语的文档,以生成包括所搜索文档的文档标识符的文档标识(ID)信息,并且包含与搜索词匹配的匹配项,以及术语 搜索文档中匹配项的匹配项和位置信息的标识符。 然后作出关于匹配项的位置信息是否满足关于搜索项之间的位置关系的搜索查询条件中指定的位置条件的决定,然后生成表示搜索查询条件的满足的匹配信息, 满足位置条件。 通过接近度条件判定,确定匹配项是否满足在搜索查询条件中指定的期间距离条件。 通过上下文条件决定,确定匹配项是否满足同一子句,同一句或同一段中的搜索项的同意的同意条件。 通过逻辑条件,确定匹配项是否满足在搜索查询条件中指定的搜索项之间的逻辑条件。

    System for character stream search using finite state automaton technique
    7.
    发明授权
    System for character stream search using finite state automaton technique 失效
    使用有限状态自动机技术的字符流搜索系统

    公开(公告)号:US5051886A

    公开(公告)日:1991-09-24

    申请号:US205923

    申请日:1988-06-13

    IPC分类号: G06F17/21 G06F17/30

    摘要: A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out of a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transistion destination which is determined in association with the configuration of the FSA. The following processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.

    摘要翻译: 一种使用FSA的字符流搜索系统,用于一次确定在经历搜索操作的搜索字符流中是否存在作为搜索对象的多个字符流,并且包括用代码表示的多个字符。 在系统中,在搜索字符流和搜索对象字符之间进行归类。 在作为对照的结果存在匹配的搜索对象字符的情况下,由FSA指示的预定状态执行状态转换。 在不存在匹配的搜索对象字符的情况下,执行与FSA的配置相关联地确定的转移目的地的状态转换的失败处理。 以对于每个经过搜索操作的字符的预定上限值的计数完成以下处理。

    Character stream search apparatus using a finite state automation
    8.
    发明授权
    Character stream search apparatus using a finite state automation 失效
    使用有限状态自动化的字符流搜索装置

    公开(公告)号:US5278981A

    公开(公告)日:1994-01-11

    申请号:US761442

    申请日:1991-09-18

    摘要: A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out to a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transition destination which is determined in association with the configuration of the FSA. The failure processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.

    摘要翻译: 一种使用FSA的字符流搜索系统,用于一次确定在经历搜索操作的搜索字符流中是否存在作为搜索对象的多个字符流,并且包括用代码表示的多个字符。 在系统中,在搜索字符流和搜索对象字符之间进行归类。 在作为对照的结果存在匹配的搜索对象字符的情况下,状态转换被执行到由FSA指示的预定状态。 在不存在匹配的搜索对象字符的情况下,执行与FSA的配置相关联地确定的转换目的地的状态转换的故障处理。 以对于每个经过搜索操作的字符的预定上限值的计数完成故障处理。

    Document data processing method and apparatus for document retrieval
    9.
    发明授权
    Document data processing method and apparatus for document retrieval 失效
    用于文件检索的文档数据处理方法和装置

    公开(公告)号:US5469354A

    公开(公告)日:1995-11-21

    申请号:US843162

    申请日:1992-02-28

    摘要: High-speed full document retrieval method and system capable of providing result of retrieval within practically acceptable short search time. Upon registration of documents in a document database, condensed texts are created by decomposing each of textual character strings of the documents to be registered into fragmental character strings in dependence on character species and by checking mutual inclusion relations existing among the fragmental character strings. A component character table is created in which characters occurring in each of the condensed texts are registered without duplication. The condensed texts and the component character table are registered in the data base together with the texts of the documents to be registered. Upon retrieval of a document containing a search term designated by a user, a component character table search is first executed to extract those documents which contain all species of characters constituting the search term by consulting the component character table, and subsequently a condensed text search is executed by consulting the condensed texts of the documents. Finally, a text body search is executed for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through the component character table search and the condensed text search.

    摘要翻译: 高速全文检索方法和系统能够在实际可接受的短时间内提供检索结果。 在文档数据库中注册文档时,通过根据字符种类将要注册的文档的每个文本字符串分解成分段字符串并通过检查分段字符串之间存在的相互包含关系来创建精简文本。 创建组件字符表,其中在每个精简文本中出现的字符都不重复地注册。 精简文本和组件字符表与要注册的文档的文本一起登记在数据库中。 在检索包含由用户指定的搜索词的文档时,首先执行组件字符表搜索,以通过查看组件字符表来提取包含构成搜索词的所有字符的字符的文档,随后,浓缩文本搜索是 通过咨询文件的精简文本执行。 最后,通过查阅通过组件字符表搜索和浓缩文本搜索提取的文档的文本,执行文本正文搜索以提取满足查询条件的查询条件的文档。

    Image filing apparatus and method for thereby encoding and storing
various documents
    10.
    发明授权
    Image filing apparatus and method for thereby encoding and storing various documents 失效
    图像填充设备及其编码和存储各种文档的方法

    公开(公告)号:US5231482A

    公开(公告)日:1993-07-27

    申请号:US761770

    申请日:1991-08-15

    IPC分类号: H04N1/41 H04N1/64

    CPC分类号: H04N1/642 H04N1/4105 H04N1/64

    摘要: An image filing apparatus and method for receiving as multivalue data an image corresponding to a document, converting the data to binary image data and storing the binary image data. According to the features of the present invention, pixels of that portion of an input image corresponding to a particular color are extracted, and luminance data expressing monochromatic binary image data and binary image data designating colored portions are stored in different planes. Not only black pixels but also pixels expressed in a particular color such as red are described in the plane for the luminance data. Pixels having a particular color to be expressed in "red" such as red characters are extracted and recorded as "1" in another particular color plane, for example, in R-plane different from the luminance plane. The R-plane is recorded with binary image data; only pixels written in "red" are expressed as "1" there and other pixels as "0". When outputted, the pixels in which the contents of the luminance plane are "1" and the contents of the R-plane are "0" are displayed in black and the pixels in which the contents of the R-plane are "1" are displayed in red.

    摘要翻译: PCT No.PCT / JP90 / 01630 Sec。 371日期1991年8月15日 102(e)日期1991年8月15日PCT 1990年12月13日PCT PCT。 出版物WO91 / 09488 日期:1991年6月27日。一种图像归档装置和方法,用于作为多值数据接收与文档相对应的图像,将数据转换为二进制图像数据并存储二进制图像数据。 根据本发明的特征,提取与特定颜色相对应的输入图像的该部分的像素,并且将表示单色二值图像数据的亮度数据和指定着色部分的二值图像数据存储在不同的平面中。 在亮度数据的平面中描述不仅黑色像素,而且以诸如红色的特定颜色表示的像素。 具有以红色表示的特定颜色的像素例如红色字符被提取并在另一特定色彩平面中记录为“1”,例如在与亮度平面不同的R平面中。 R平面用二进制图像数据记录; 写入“红色”的像素在此被表示为“1”,其他像素表示为“0”。 当输出时,亮度平面的内容为“1”的像素和R平面的内容为“0”,以黑色显示,R平面的内容为“1”的像素为 以红色显示。