摘要:
A text classification apparatus directed to a plurality of languages, includes a unit for extracting information for converting a word from non-classified (unlabeled) texts, in a plurality of languages, into a word sense, a unit for learning a classification knowledge at a word sense level after converting a word extracted from a labeled text into a word sense, a unit for learning a classification knowledge at a word level from the labeled text, a unit for learning the classification knowledge at the word level from the classification knowledge at the word sense level and information on a relation between words extracted from the unlabeled text, and a unit for combining the respective classification knowledges to assign a category.
摘要:
A system allows related documents to be retrieved using conventional search engines while overcoming the ambiguity of a search key entered by the user. The system includes a word sense associative network display portion for displaying word senses of the search key entered by the user together with related word senses in a network, a search portion for conducting a search by generating a search key based on word senses selected by the user, and a filtering portion for selecting documents from the result of the search that matches the selected word sense.
摘要:
A text classification apparatus directed to a plurality of languages, includes a unit for extracting information for converting a word from non-classified (unlabeled) texts, in a plurality of languages, into a word sense, a unit for learning a classification knowledge at a word sense level after converting a word extracted from a labeled text into a word sense, a unit for learning a classification knowledge at a word level from the labeled text, a unit for learning the classification knowledge at the word level from the classification knowledge at the word sense level and information on a relation between words extracted from the unlabeled text, and a unit for combining the respective classification knowledges to assign a category.
摘要:
An object of the present invention is to develop and provide a lung cancer differential marker with which lung cancer can be diagnosed conveniently and highly sensitively without depending only on increase or decrease in protein expression level between cancer patients and healthy persons. Another object of the present invention is to develop and provide a glycan marker capable of distinguishing histological types of lung cancer. Of serum glycoproteins, glycopeptide and glycoprotein groups whose glycan structures were altered specifically in lung cancer cell culture supernatants were identified, and they are provided as lung cancer differential markers.
摘要:
The present invention is directed to developing a glycan markers capable of detecting a hepatic disease, and more specifically to developing a glycan marker indicating a hepatic disease-state. Furthermore, the present invention is also directed to developing a glycan marker capable of distinguishing hepatic disease-states with the progress of hepatocarcinoma. The present inventors identified, among the serum glycoproteins, glycopeptides and glycoproteins in which a glycan structure specifically changes due to a hepatic diseases including hepatocarcinoma and provide these as novel glycan markers (glycopeptide and glycoprotein) specific to hepatic disease-states.
摘要:
For each word occurring in Japanese text, a set of words co-occurring with it and their co-occurrence frequencies are extracted, where two words are regarded as co-occurring with each other when they occur in the same sentence. Likewise, for each word occurring in an English text that corresponds to the Japanese text, a set of words co-occurring with it and their co-occurrence frequencies are extracted. A correlation is calculated between a Japanese word and an English word based upon the co-occurrent word set of the Japanese word and that of the English word, with the assistance of a Japanese-English bilingual dictionary of basic words. The correlation is defined as the ratio of the number of possible correspondences between the two co-occurrent word sets to the total of the co-occurrence frequencies in the two co-occurrent word sets. Pairs of words having a mutually maximum correlation are selected as candidate translation pairs of words, and displayed on a display device. Finally, user-selected pairs are registered in the bilingual dictionary. Thus, the bilingual dictionary is augmented incrementally.
摘要:
To automatically generate translation templates containing variables which can be replaced with various words or phrases from a bilingual pair of sentences, the machine translation system reads the first language sentence and second language sentence which are mutually equivalent, analyzes the morphemes and phrases of the sentences, identifies the word correspondence between the first language sentence and the second language sentence with reference to the bilingual dictionary, generates a translation template by replacing the corresponding words of the first language sentence and second language sentence with variables which are mutually correspondent, extracts the phrase correspondence between the first language sentence and the second language sentence, generates a generalized template wherein the corresponding phrases are replaced with variables, and generates a partial template wherein the corresponding phrases are separated. By doing this, a translation template can be learned (automatically generated) from bilingual pair of sentences, and high quality translation can be obtained.
摘要:
An apparatus for and a method of selecting a target language equivalent of a predicate word in a source language word string for use in a machine translation system in which use is made of a dictionary having records, each including data on an entry word of a predicate source language word, on predicate target language words equivalent to the entry source language word and on semantic features of non-predicate words related to a case governed by the predicate target language words or including data on an entry word of a non-predicate source language word, on a non-predicate target language word equivalent to the entry source language word and on semantic features of the non-predicate target language word. A processor is coupled to the dictionary for fetching therefrom the semantic feature data of the non-predicate words serving as arguments for the case governed by the predicate target language words equivalent to the predicate word in the source language word string and the semantic feature data of one of the non-predicate target language words which is equivalent to the non-predicate word in the source language word string, carrying out numerical operations between the fetched data to provide a plurality of operation results, and selecting one of the operation results according to predetermined criteria and determining that one of the predicate target language words which has the data of the non-predicate words providing the selected operation result as the target language equivalent of the source language predicate word.
摘要:
A method of segmenting a text into words in which a dictionary search is made while using a character string in the text as a search key, and it is checked whether a word retrieved from the dictionary can be grammatically connected to another word adjacent thereto or not. Segmentation processing is carried out using only words registered in a word dictionary, processing for identifying an unknown word is carried out when the segmentation processing comes to a deadlock, and then the segmentation processing is continued for that portion of the text which follows the identified unknown word.
摘要:
The present invention is directed to developing a glycan markers capable of detecting a hepatic disease, and more specifically to developing a glycan marker indicating a hepatic disease-state. Furthermore, the present invention is also directed to developing a glycan marker capable of distinguishing hepatic disease-states with the progress of hepatocarcinoma. The present inventors identified, among the serum glycoproteins, glycopeptides and glycoproteins in which a glycan structure specifically changes due to a hepatic diseases including hepatocarcinoma and provide these as novel glycan markers (glycopeptide and glycoprotein) specific to hepatic disease-states.