System for plural-string search with a parallel collation of a first
partition of each string followed by finite automata matching of second
partitions
    1.
    发明授权
    System for plural-string search with a parallel collation of a first partition of each string followed by finite automata matching of second partitions 失效
    用于多字符串搜索的系统,其中每个字符串的第一分区的并行排序以及第二分区的有限自动机匹配

    公开(公告)号:US5452451A

    公开(公告)日:1995-09-19

    申请号:US349124

    申请日:1994-12-01

    IPC分类号: G06F17/30

    摘要: A parallel comparator for performing a parallel and high-speed processing for collation of partial character strings which are partially taken out of a plurality of character strings of interest to be searched out with a character string to be searched in which document data to be searched is arranged sequentially from a leading character, is provided in a front stage of an automaton executing device. Only when a part of the character string to be searched coincides with the partial character string set in the comparator, the collation of the remaining portion of the character string to be searched is performed by the automaton executing device. Also, it is possible to set "don't care" in which a character at any position in the partial character string is ignored at the time of comparison by the comparator and to set a negation condition in which the comparison by the comparator is made taking the negation of a character at any position in the partial character string.

    摘要翻译: 一种并行比较器,用于执行并行和高速处理,用于将要搜索的多个感兴趣的字符串部分地从要搜索的文档数据中搜索的字符串中部分取出的部分字符串对准, 设置在自动机执行装置的前级中,从主角排列顺序排列。 只有当要搜索的字符串的一部分与比较器中设置的部分字符串一致时,由自动机执行装置执行要搜索的字符串的剩余部分的核对。 此外,可以设置在比较器比较时忽略部分字符串中的任何位置的字符的“无关心”,并且设置比较器进行比较的否定条件 在部分字符串中的任何位置取一个字符。

    Document retrieval method and system
    3.
    发明授权
    Document retrieval method and system 失效
    文件检索方法和系统

    公开(公告)号:US5757983A

    公开(公告)日:1998-05-26

    申请号:US517722

    申请日:1995-08-21

    摘要: A document retrieval method and system for retrieving, from a document database storing document data in the form of character codes, a document which contains given search terms and which meets a given search query condition. From documents loaded from the document database, a document containing terms which match the search terms is searched to generate document identification (ID) information including a document identifier of the searched document and containing match terms found to match with the search terms as well as term identifiers of the match terms and position information of the match terms in the searched document. A decision is then made as to whether or not the position information of the match terms satisfies a positional condition specified in the search query condition concerning a positional relation between the search terms, and match information is then generated indicating satisfaction of the search query condition when the positional condition is satisfied. Through a proximity condition decision, it is ascertained whether the match terms satisfy an inter-term distance condition specified in the search query condition. Through a contextual condition decision, it is determined whether the match terms satisfy a concurrence condition specifying concurrence of the search terms in a same sub-sentence, a same sentence or a same paragraph. Through a logical condition, it is decided whether the match terms satisfy a logical condition between the search terms specified in the search query condition.

    摘要翻译: 一种文档检索方法和系统,用于从存储文字数据形式的文档数据的文档数据库中检索包含给定搜索词并且满足给定搜索查询条件的文档。 从文档数据库中加载的文档中,搜索包含与搜索词匹配的术语的文档,以生成包括所搜索文档的文档标识符的文档标识(ID)信息,并且包含与搜索词匹配的匹配项,以及术语 搜索文档中匹配项的匹配项和位置信息的标识符。 然后作出关于匹配项的位置信息是否满足关于搜索项之间的位置关系的搜索查询条件中指定的位置条件的决定,然后生成表示搜索查询条件的满足的匹配信息, 满足位置条件。 通过接近度条件判定,确定匹配项是否满足在搜索查询条件中指定的期间距离条件。 通过上下文条件决定,确定匹配项是否满足同一子句,同一句或同一段中的搜索项的同意的同意条件。 通过逻辑条件,确定匹配项是否满足在搜索查询条件中指定的搜索项之间的逻辑条件。

    Range-conditional character string retrieving method and system
    6.
    发明授权
    Range-conditional character string retrieving method and system 失效
    范围条件字符串检索方法和系统

    公开(公告)号:US5138669A

    公开(公告)日:1992-08-11

    申请号:US724161

    申请日:1991-07-01

    IPC分类号: G06F17/30 G06K9/62 G06K9/72

    摘要: A range-conditional character string retrieving method and system capable of performing retrieval of a numerical value from a character string at an increased speed by shortening the time taken for generation of finite automaton, range condition retrieval for a character string containing admixedly numeric characters and non-numeric characters such as alphabetic letters and highly intelligent retrieval of a numerical value with designation of preceding and succeeding characters. Given range condition is partitioned in accordance with difference in the digit number between upper and lower limit values, whereon retrieval is performed in each of partitioned ranges in parallel. When a finite automaton transits from a predetermined state to at least two state in dependence on the result of collation of a character string subjected to retrieval, conditions for the state transitions are designated in terms of corresponding codes. A numerical value detecting unit for detecting a numerical value of interest from the character string subjected to retrieval is provided in association with a range decision unit for deciding whether the numerical value detected by the numerical value detecting unit falls within a specified range. A character string collating unit for retrieving a specific character string from the string subjected to retrieval is provided in association with a range condition collating unit for detecting a numerical value falling within a specific range from the specific character string.

    摘要翻译: 一种范围条件字符串检索方法和系统,其能够通过缩短生成有限自动机所需的时间,以增加的速度从字符串中检索数值,对于包含混合数字字符的字符串的范围条件检索, - 数字字符,如字母字母,高度智能地检索具有指定前后字符的数值。 给定范围条件根据上限值和下限值之间的数字值的差异进行分区,其中在每个分区范围中并行执行检索。 当有限自动机根据经检索的字符串的排序结果从预定状态转换到至少两个状态时,状态转换的条件根据相应的代码来指定。 与用于判定由数值检测单元检测到的数值是否在规定范围内的范围决定单元相关联地,设置有用于检测经检索的字符串的感兴趣的数值的数值检测单元。 与用于检测落入特定字符串的特定范围内的数值的范围条件对照单元相关联地提供用于从经过检索的字符串中检索特定字符串的字符串对照单元。

    System for character stream search using finite state automaton technique
    7.
    发明授权
    System for character stream search using finite state automaton technique 失效
    使用有限状态自动机技术的字符流搜索系统

    公开(公告)号:US5051886A

    公开(公告)日:1991-09-24

    申请号:US205923

    申请日:1988-06-13

    IPC分类号: G06F17/21 G06F17/30

    摘要: A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out of a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transistion destination which is determined in association with the configuration of the FSA. The following processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.

    摘要翻译: 一种使用FSA的字符流搜索系统,用于一次确定在经历搜索操作的搜索字符流中是否存在作为搜索对象的多个字符流,并且包括用代码表示的多个字符。 在系统中,在搜索字符流和搜索对象字符之间进行归类。 在作为对照的结果存在匹配的搜索对象字符的情况下,由FSA指示的预定状态执行状态转换。 在不存在匹配的搜索对象字符的情况下,执行与FSA的配置相关联地确定的转移目的地的状态转换的失败处理。 以对于每个经过搜索操作的字符的预定上限值的计数完成以下处理。

    Document data processing method and apparatus for document retrieval
    8.
    发明授权
    Document data processing method and apparatus for document retrieval 失效
    用于文件检索的文档数据处理方法和装置

    公开(公告)号:US5469354A

    公开(公告)日:1995-11-21

    申请号:US843162

    申请日:1992-02-28

    摘要: High-speed full document retrieval method and system capable of providing result of retrieval within practically acceptable short search time. Upon registration of documents in a document database, condensed texts are created by decomposing each of textual character strings of the documents to be registered into fragmental character strings in dependence on character species and by checking mutual inclusion relations existing among the fragmental character strings. A component character table is created in which characters occurring in each of the condensed texts are registered without duplication. The condensed texts and the component character table are registered in the data base together with the texts of the documents to be registered. Upon retrieval of a document containing a search term designated by a user, a component character table search is first executed to extract those documents which contain all species of characters constituting the search term by consulting the component character table, and subsequently a condensed text search is executed by consulting the condensed texts of the documents. Finally, a text body search is executed for extracting a document which satisfies query condition imposed on the search term by consulting the texts of the documents extracted through the component character table search and the condensed text search.

    摘要翻译: 高速全文检索方法和系统能够在实际可接受的短时间内提供检索结果。 在文档数据库中注册文档时,通过根据字符种类将要注册的文档的每个文本字符串分解成分段字符串并通过检查分段字符串之间存在的相互包含关系来创建精简文本。 创建组件字符表,其中在每个精简文本中出现的字符都不重复地注册。 精简文本和组件字符表与要注册的文档的文本一起登记在数据库中。 在检索包含由用户指定的搜索词的文档时,首先执行组件字符表搜索,以通过查看组件字符表来提取包含构成搜索词的所有字符的字符的文档,随后,浓缩文本搜索是 通过咨询文件的精简文本执行。 最后,通过查阅通过组件字符表搜索和浓缩文本搜索提取的文档的文本,执行文本正文搜索以提取满足查询条件的查询条件的文档。

    Character stream search apparatus using a finite state automation
    9.
    发明授权
    Character stream search apparatus using a finite state automation 失效
    使用有限状态自动化的字符流搜索装置

    公开(公告)号:US5278981A

    公开(公告)日:1994-01-11

    申请号:US761442

    申请日:1991-09-18

    摘要: A character stream search system using an FSA for determining at a time whether or not a plurality of character streams as search objects exist in a search character stream which undergoes a search operation and which comprises a plurality of characters expressed with codes. In the system, a collation is conducted between the search character stream and a search object character. In a case where there exists a matched search object character as a result of the collation, a state transition is carried out to a predetermined state indicated by the FSA. In a case where there does not exist a matched search object character, a failure processing to effect a state transition to a transition destination which is determined in association with the configuration of the FSA. The failure processing is completed at a count which is a predetermined upper-limit value for each character undergone the search operation.

    摘要翻译: 一种使用FSA的字符流搜索系统,用于一次确定在经历搜索操作的搜索字符流中是否存在作为搜索对象的多个字符流,并且包括用代码表示的多个字符。 在系统中,在搜索字符流和搜索对象字符之间进行归类。 在作为对照的结果存在匹配的搜索对象字符的情况下,状态转换被执行到由FSA指示的预定状态。 在不存在匹配的搜索对象字符的情况下,执行与FSA的配置相关联地确定的转换目的地的状态转换的故障处理。 以对于每个经过搜索操作的字符的预定上限值的计数完成故障处理。

    Character string retrieving system and method
    10.
    发明授权
    Character string retrieving system and method 失效
    字符串检索系统和方法

    公开(公告)号:US5140644A

    公开(公告)日:1992-08-18

    申请号:US733982

    申请日:1991-07-22

    IPC分类号: G06F17/30 G06K9/62 G06K9/72

    摘要: A compact character string retrieving system capable of producing correctly the result of matching without omission even upon occurrence of multiple matching in which a plurality of search terms are matched for one character string by a finite automation. A destination state for transition brought about by a trailing character of the search term is newly created instead of an initial state. A transition table storage stores the destination state. On the basis of the source state number and a specified pattern character code, the destination state number is read out from the state transition table storage. When the state number read out represents the destination state of the transition brought about by the trailing character of the specified pattern character string, an identifier thereof is outputted. The identifiers of the search terms matched are each represented by one bit information, and a group of corresponding flags is stored in one slot. Multiple matching can be performed without omission. The character string retrieving system is implemented in a reduced size.

    摘要翻译: 一种紧凑的字符串检索系统,即使在通过有限自动化将多个搜索词与一个字符串相匹配的多个匹配发生时,也能够正确地产生匹配结果而不漏出。 新创建由搜索项的尾随字符引起的转换的目的地状态,而不是初始状态。 转换表存储存储目的地状态。 根据源状态号和指定的码字符代码,从状态转换表存储中读出目的地状态号。 当读出状态号表示由指定模式字符串的尾部字符引起的转换的目的地状态时,输出其标识符。 匹配的搜索项的标识符各自由一个比特信息表示,并且一组对应的标志被存储在一个时隙中。 可以进行多次匹配而不遗漏。 字符串检索系统以减小的尺寸实现。