摘要:
A high-speed document retrieval system creates a regular expression dictionary and a word index on the basis of a retrieval document and a word dictionary to conduct retrieval to a document through the regular expression dictionary and the word index at a high speed. A regular expression dictionary expressing a set of character strings having the same length is created from a word dictionary. In terms of a character string included in a retrieval document and matching with a regular expression in the regular expression dictionary, an index element is recorded in a word index when there is no different index element which allows an observing index element to be deducible, which eventually produces a word index capable of achieving a high-speed full-text retrieval without the noticeable increase in the index capacity. The document retrieval system performs the retrieval of the retrieval document through the use of the word dictionary, the regular expression dictionary and the word index, so that a high-speed full-text retrieval is possible without the impairment of retrieval efficiency even if the retrieval character string is covered with words having a small number of characters and making less overlap.
摘要:
A table of derivation elements and a state-transition table indicating applicable strings of derivation types are produced according to pronunciation expanding rules in a similar character string expanding apparatus. Each of the derivation elements is composed of a derived sound derived from a key sound placed at a key position of a question pronunciation character string, a sound position of the derived sound in each of character strings expanded from the question pronunciation character string and one or more derivation types indicating how the derived sound placed at the sound position is derived from the key sound placed at the key position. In a character string retrieving apparatus, strings of derivation types are produced one by one by arranging derivation types of the table of derivation elements in order of the sound position, and it is judged whether or not each of the strings of derivation types agrees with one of the applicable strings to judge whether or not each of the strings of derivation types satisfies the pronunciation expanding rules. Thereafter, trademark numbers corresponding to the strings of derivation types satisfying the pronunciation expanding rules are retrieved. Therefore, because any character strings similar in pronunciation to the question pronunciation character string is not directly used, the trademark numbers indicating trademarks similar to the question pronunciation character string can be retrieved at high speed.