摘要:
A character recognition method is arranged to supplement an erroneously recognized character with a linguistic knowledge. In this method, the extraction of a candidate based on the search of a word dictionary occupies a large part of the operation. To speed up the extraction of a candidate word, therefore, the method is provided to search the dictionary by using a group of candidate characters or a dictionary header for processing an inflected form of a verb. Further, the present method is provided for calculating a word matching cost for improving an efficiency of modifying an erroneous recognition. The word search is done by using a "hybrid method" arranged of "candidate-character-driven word extraction" and "dictionary-driven word extraction". Moreover, the word-dictionary is arranged to have a header word composed of an inflectional ending of a verb and an auxiliary verb or a particle added to the tail of the inflectional ending. The present method attaches much importance to the difference of the matching cost about a character with a totally high confidence ratio than the difference of the matching cost about a character with a totally low confidence ratio.
摘要:
Ordering is properly performed for document databases registered in an associative search server. In an associative search server capable of performing an associative search by correlating a plurality of document databases, the history of the associative search is stored as an associative search recording table by associative search recording table storing means. By using this associative search recording table, a showing order of document databases presented by document database selecting means is properly set by showing order changing means. Alternatively, by registration fee calculating means, calculation is properly carried out as to registration fees of the document database registered in the associative search server.
摘要:
A system for displaying the results of a search provided by one of two different search systems enabling continuous searching. One search system includes a search takeover data production command used to output search takeover data articles from the search. The other search system includes a search takeover data reading command used to read search takeover data. A document identifier correspondence table associates the identifiers specified in the search takeover data. When a user clicks a search system transfer instruction button in one search system, the search takeover data producing command is executed to produce search takeover data which is passed to the other search system. The latter search system regards the list of identifiers of articles which was passed by the search takeover data reading command as the search results, and thus operates continuously.
摘要:
Both a first kind of terms and a second kind of terms are designated. A user desires to obtain a relationship between these terms. By employing relations between these terms having been previously stored in a storage in advance, the manner in which these terms are correlated is dynamically displayed, while nodes and edges are gradually increased. In this manner, relations are easily found for concepts (terms) that seem not to be correlated, and an efficient search can also be performed.
摘要:
A system for processing information for providing semantic information and/or information associated with the semantic information useful for each individual organism through effective utilization of differences in nucleotide sequence-related information among individual organisms is constructed. This system comprises steps of (a) receiving nucleotide sequence-related information concerning a predetermined individual and (b) identifying, from a memory comprising a nucleotide sequence-related information group for each individual including a plurality of sets to which positional information representing a position in a nucleotide sequence and nucleotide sequence-related information corresponding to the positional information are mutually related, a nucleotide sequence-related information group including nucleotide sequence-related information that has consistency of the received nucleotide sequence-related information.
摘要:
A known method for selecting words (or word sequences), which is an important aspect of information retrieval, involves the problems of inability to eliminate high-frequency common words and of often arbitrary setting of the threshold value for dividing important and unimportant words. These problems are solved by normalizing the difference between the word distribution in a subset of all documents containing a word to be extracted (or a subset of said document set) and the word distribution in the set of all documents with the number of words in the said subset of all documents containing the word as a parameter, and the accuracy of support information retrieval is thereby enhanced.
摘要:
Feature of a compound is predicted by using information on interactions between substances. A database of interactions between compounds and genes/proteins is constructed on the base of information collected from bibliographic databases, gene/protein databases, and disease databases, and an interaction network is prepared by mapping the collected information to thereby enable prediction of the features of a compound.
摘要:
Method and apparatus for mapping cDNA sequences to genome sequences at high speed are disclosed. A genome sequence is divided into K-base-length partial sequences that do not overlap and are continuous (K-mers). Then, they are stored in a table with coordinates on the genome sequence where each of them appears. Using this table, correspondences of K-mers are created from perfectly matching pairs of K-mers on the cDNA and the K-mers on the genome sequence. Of all the correspondences of K-mers, those sets that represent correct mapping rather than accidental coincidence are identified at high speed by a method based on a publicly known method that extracts a longest increasing partial sequence from a numerical sequence. The resultant correspondences of K-mers are extended to the association between bases by sequence alignment, and then correction at splice sites is performed. In order to allow for an optimum selection of parameters, an interactive interface capable of real-time response is provided.
摘要:
New information is extracted efficiently and exhaustively to predict the function of genes or proteins. First, known-sequence data with high relevance to a search object sequence or structure information is obtained using a sequence database. Then, documents relevant to the resultant known-sequence data are retrieved, using a document database. Feature words common to a plurality of documents extracted are extracted and outputted.
摘要:
This invention provides a method for processing information that allows the discovery of a correlation between predetermined individual-related information and nucleotide sequence-related information concerning an individual. This method comprises: step (a) of calculating a percentage for each piece of nucleotide sequence-related information using a first occurrence frequency and a second occurrence frequency, wherein the first occurrence frequency is calculated for each possible piece of nucleotide sequence-related information in a given position in a nucleotide sequence based on a predetermined population and the second occurrence frequency is calculated for each possible piece of nucleotide sequence-related information in the aforementioned position based on the population gathered for predetermined individual-related information concerning an individual; and step (b) of associating the percentage calculated in step (a) with positional information representing the aforementioned position and with the nucleotide sequence-related information for each the predetermined piece of individual-related information.