Cross lingual text classification apparatus and method
    1.
    发明授权
    Cross lingual text classification apparatus and method 有权
    跨语言文本分类装置和方法

    公开(公告)号:US07467079B2

    公开(公告)日:2008-12-16

    申请号:US10784833

    申请日:2004-02-24

    IPC分类号: G06F17/27 G06F17/20

    CPC分类号: G06F17/2765

    摘要: A text classification apparatus directed to a plurality of languages, includes a unit for extracting information for converting a word from non-classified (unlabeled) texts, in a plurality of languages, into a word sense, a unit for learning a classification knowledge at a word sense level after converting a word extracted from a labeled text into a word sense, a unit for learning a classification knowledge at a word level from the labeled text, a unit for learning the classification knowledge at the word level from the classification knowledge at the word sense level and information on a relation between words extracted from the unlabeled text, and a unit for combining the respective classification knowledges to assign a category.

    摘要翻译: 指向多种语言的文本分类装置包括用于从多个语言的非分类(未标记)文本中提取用于将单词转换成单词感觉的单元,用于在一种语言中学习分类知识的单元 将从标记文本提取的单词转换为单词感觉的单词感觉级别,用于从标记的文本学习单词级别的分类知识的单元,用于从所述单词级别从分类知识学习词级的分类知识的单元 单词感觉级别和关于从未标记文本提取的单词之间的关系的信息,以及用于组合各个分类知识以分配类别的单元。

    Document search system using a meaning relation network
    2.
    发明授权
    Document search system using a meaning relation network 有权
    文献检索系统采用意义关系网络

    公开(公告)号:US07240051B2

    公开(公告)日:2007-07-03

    申请号:US10784768

    申请日:2004-02-24

    IPC分类号: G06F7/00 G06F17/30

    摘要: A system allows related documents to be retrieved using conventional search engines while overcoming the ambiguity of a search key entered by the user. The system includes a word sense associative network display portion for displaying word senses of the search key entered by the user together with related word senses in a network, a search portion for conducting a search by generating a search key based on word senses selected by the user, and a filtering portion for selecting documents from the result of the search that matches the selected word sense.

    摘要翻译: 系统允许使用常规搜索引擎检索相关文档,同时克服用户输入的搜索键的模糊性。 该系统包括字检测关联网络显示部分,用于显示由用户输入的搜索关键词与网络中的相关词感觉的单词感觉;搜索部分,用于通过基于由网络中选择的词感觉生成搜索关键字进行搜索; 用户和用于从搜索结果中选择与选择的词语匹配匹配的文档的过滤部分。

    Cross lingual text classification apparatus and method
    3.
    发明申请
    Cross lingual text classification apparatus and method 有权
    跨语言文本分类装置和方法

    公开(公告)号:US20050071152A1

    公开(公告)日:2005-03-31

    申请号:US10784833

    申请日:2004-02-24

    IPC分类号: G06F17/30 G06F17/27 G06F17/28

    CPC分类号: G06F17/2765

    摘要: A text classification apparatus directed to a plurality of languages, includes a unit for extracting information for converting a word from non-classified (unlabeled) texts, in a plurality of languages, into a word sense, a unit for learning a classification knowledge at a word sense level after converting a word extracted from a labeled text into a word sense, a unit for learning a classification knowledge at a word level from the labeled text, a unit for learning the classification knowledge at the word level from the classification knowledge at the word sense level and information on a relation between words extracted from the unlabeled text, and a unit for combining the respective classification knowledges to assign a category.

    摘要翻译: 指向多种语言的文本分类装置包括用于从多个语言的非分类(未标记)文本中提取用于将单词转换成单词感觉的单元,用于在一种语言中学习分类知识的单元 将从标记文本提取的单词转换为单词感觉的单词感觉级别,用于从标记的文本学习单词级别的分类知识的单元,用于从所述单词级别从分类知识学习词级的分类知识的单元 单词感觉级别和关于从未标记文本提取的单词之间的关系的信息,以及用于组合各个分类知识以分配类别的单元。

    Method of computer-based automatic extraction of translation pairs of
words from a bilingual text
    6.
    发明授权
    Method of computer-based automatic extraction of translation pairs of words from a bilingual text 失效
    基于计算机的双语文本自动提取翻译对的方法

    公开(公告)号:US5907821A

    公开(公告)日:1999-05-25

    申请号:US743529

    申请日:1996-11-04

    IPC分类号: G06F17/27 G06F17/28

    摘要: For each word occurring in Japanese text, a set of words co-occurring with it and their co-occurrence frequencies are extracted, where two words are regarded as co-occurring with each other when they occur in the same sentence. Likewise, for each word occurring in an English text that corresponds to the Japanese text, a set of words co-occurring with it and their co-occurrence frequencies are extracted. A correlation is calculated between a Japanese word and an English word based upon the co-occurrent word set of the Japanese word and that of the English word, with the assistance of a Japanese-English bilingual dictionary of basic words. The correlation is defined as the ratio of the number of possible correspondences between the two co-occurrent word sets to the total of the co-occurrence frequencies in the two co-occurrent word sets. Pairs of words having a mutually maximum correlation are selected as candidate translation pairs of words, and displayed on a display device. Finally, user-selected pairs are registered in the bilingual dictionary. Thus, the bilingual dictionary is augmented incrementally.

    摘要翻译: 对于日语文本中出现的每个单词,提取与其共同出现的一组单词和它们的同现频率,其中两个单词在同一句子中出现时被视为共同出现。 同样地,对于与日文文本相对应的英文文本中出现的每个单词,提取与其共同出现的一组单词及其同现频率。 在基础词的日英双语词典的帮助下,基于日语单词和英文单词的共同词组,在日语单词和英语单词之间计算相关性。 相关性被定义为两个共同词组之间可能对应的数目与两个同时出现的词组中的同现频率的总和的比率。 选择具有相互最大相关性的单词对作为词语的候选翻译对,并将其显示在显示装置上。 最后,用户选择的对登记在双语词典中。 因此,双语词典逐渐增加。

    System and method for automatically generating translation templates
from a pair of bilingual sentences
    7.
    发明授权
    System and method for automatically generating translation templates from a pair of bilingual sentences 失效
    从双语句子自动生成翻译模板的系统和方法

    公开(公告)号:US5442546A

    公开(公告)日:1995-08-15

    申请号:US983147

    申请日:1992-11-30

    IPC分类号: G06F17/28 G06F17/20

    CPC分类号: G06F17/2827

    摘要: To automatically generate translation templates containing variables which can be replaced with various words or phrases from a bilingual pair of sentences, the machine translation system reads the first language sentence and second language sentence which are mutually equivalent, analyzes the morphemes and phrases of the sentences, identifies the word correspondence between the first language sentence and the second language sentence with reference to the bilingual dictionary, generates a translation template by replacing the corresponding words of the first language sentence and second language sentence with variables which are mutually correspondent, extracts the phrase correspondence between the first language sentence and the second language sentence, generates a generalized template wherein the corresponding phrases are replaced with variables, and generates a partial template wherein the corresponding phrases are separated. By doing this, a translation template can be learned (automatically generated) from bilingual pair of sentences, and high quality translation can be obtained.

    摘要翻译: 为了自动生成包含可由双语句子中的各种单词或短语替换的变量的翻译模板,机器翻译系统读取相互等效的第一语言句和第二语句,分析句子的语素和短语, 通过参照双语词典来识别第一语言句和第二语句之间的对应关系,通过用相互对应的变量替换第一语言句和第二语言句的对应词,生成翻译模板,提取短语对应 在第一语言句子和第二语言句子之间产生广义模板,其中相应的短语被替换为变量,并且生成部分模板,其中对应的短语被分离。 通过这样做,可以从双语句子学习(自动生成)翻译模板,并且可以获得高质量的翻译。

    Apparatus for and method of selecting a target language equivalent of a
predicate word in a source language word string in a machine
translation system
    8.
    发明授权
    Apparatus for and method of selecting a target language equivalent of a predicate word in a source language word string in a machine translation system 失效
    在机器翻译系统中选择源语言字词中预测词的目标语言等价物的方法和方法

    公开(公告)号:US5227971A

    公开(公告)日:1993-07-13

    申请号:US366668

    申请日:1989-06-14

    IPC分类号: G06F17/27 G06F17/28

    摘要: An apparatus for and a method of selecting a target language equivalent of a predicate word in a source language word string for use in a machine translation system in which use is made of a dictionary having records, each including data on an entry word of a predicate source language word, on predicate target language words equivalent to the entry source language word and on semantic features of non-predicate words related to a case governed by the predicate target language words or including data on an entry word of a non-predicate source language word, on a non-predicate target language word equivalent to the entry source language word and on semantic features of the non-predicate target language word. A processor is coupled to the dictionary for fetching therefrom the semantic feature data of the non-predicate words serving as arguments for the case governed by the predicate target language words equivalent to the predicate word in the source language word string and the semantic feature data of one of the non-predicate target language words which is equivalent to the non-predicate word in the source language word string, carrying out numerical operations between the fetched data to provide a plurality of operation results, and selecting one of the operation results according to predetermined criteria and determining that one of the predicate target language words which has the data of the non-predicate words providing the selected operation result as the target language equivalent of the source language predicate word.

    摘要翻译: 一种选择用于机器翻译系统中的源语言字串中的谓词的目标语言等价物的装置和方法,其中使用具有记录的字典,每个字典包括关于谓词的入口词的数据 源语言词,关于与入口源语言词相当的谓词目标语言词,以及与谓词目标语言词相关的非谓词的语义特征,或者包括非谓词源语言的入口词的数据 单词,在非谓词目标语言单词等价于入口源语言单词和非谓词目标语言单词的语义特征。 处理器耦合到字典,用于从其中提取非谓词的语义特征数据,该语义特征数据用作由源语言字串中的谓词词等效的谓词目标语言词来管理的情况的参数,并且语义特征数据 非谓词目标语言词之一,其等价于源语言字串中的非谓词,在获取的数据之间执行数字操作以提供多个操作结果,并根据以下操作结果选择一个: 并且确定具有提供所选择的操作结果的非谓词的数据的谓词目标语言词之一作为源语言谓词单词的目标语言。

    Method for segmenting a text into words
    9.
    发明授权
    Method for segmenting a text into words 失效
    将文本分割成文字的方法

    公开(公告)号:US4750122A

    公开(公告)日:1988-06-07

    申请号:US760918

    申请日:1985-07-31

    摘要: A method of segmenting a text into words in which a dictionary search is made while using a character string in the text as a search key, and it is checked whether a word retrieved from the dictionary can be grammatically connected to another word adjacent thereto or not. Segmentation processing is carried out using only words registered in a word dictionary, processing for identifying an unknown word is carried out when the segmentation processing comes to a deadlock, and then the segmentation processing is continued for that portion of the text which follows the identified unknown word.

    摘要翻译: 在将文本中的字符串用作搜索关键字的同时将文本分割成字典搜索的单词的方法,并且检查从词典检索到的单词是否可以语法连接到与其相邻的另一单词 。 分割处理仅使用在单词字典中登录的单词进行,当分割处理进入死锁时,执行用于识别未知单词的处理,然后对于识别未知的文本的那部分继续进行分割处理 字。