-
公开(公告)号:US20050234709A1
公开(公告)日:2005-10-20
申请号:US10398535
申请日:2002-09-26
申请人: Judith Klavans , Smaranda Muresan
发明人: Judith Klavans , Smaranda Muresan
CPC分类号: G06F17/2735
摘要: A system for automatically generating a dictionary from full text articles extracts pairs from full text articles and stores the pairs as dictionary entries. The system includes a computer readable corpus having a plurality of documents therein. A pattern processing module (120) and a grammar processing module (125) are provided for extracting pairs from the corpus and storing the pairs in a dictionary database (145). A routing processing module selectively routes sentences in the corpus to at least one of the pattern processing module or grammar processing module. In one embodiment, the routing module is incorporated into the pattern processing module which then selectively routes a portion of the sentences to the grammar processing module. A bootstrapping processing module (150) can be used to apply entries against the corpus to identify and extract additional entries.
摘要翻译: 用于从全文文本自动生成字典的系统从全文文章中提取
对,并将 对存储为字典条目。 该系统包括其中具有多个文件的计算机可读语料库。 提供模式处理模块(120)和语法处理模块(125),用于从语料库中提取 对,并将 对存储在字典数据库(145)中。 路由处理模块选择性地将语料库中的句子路由到模式处理模块或语法处理模块中的至少一个。 在一个实施例中,路由模块被并入到模式处理模块中,该模式处理模块然后选择性地将一部分句子路由到语法处理模块。 引导处理模块(150)可用于对语料库应用 条目以识别和提取附加的术语,定义>条目。 -
公开(公告)号:US07254530B2
公开(公告)日:2007-08-07
申请号:US10398535
申请日:2002-09-26
CPC分类号: G06F17/2735
摘要: A system for automatically generating a dictionary from full text articles extracts pairs from full text articles and stores the pairs as dictionary entries. The system includes a computer readable corpus having a plurality of documents therein. A pattern processing module (120) and a grammar processing module (125) are provided for extracting pairs from the corpus and storing the pairs in a dictionary database (145). A routing processing module selectively routes sentences in the corpus to at least one of the pattern processing module or grammar processing module. In one embodiment, the routing module is incorporated into the pattern processing module which then selectively routes a portion of the sentences to the grammar processing module. A bootstrapping processing module (150) can be used to apply entries against the corpus to identify and extract additional entries.
摘要翻译: 用于从全文文本自动生成字典的系统从全文文章中提取
对,并将 对存储为字典条目。 该系统包括其中具有多个文件的计算机可读语料库。 提供模式处理模块(120)和语法处理模块(125),用于从语料库中提取 对,并将 对存储在字典数据库(145)中。 路由处理模块选择性地将语料库中的句子路由到模式处理模块或语法处理模块中的至少一个。 在一个实施例中,路由模块被并入到模式处理模块中,该模式处理模块然后选择性地将一部分句子路由到语法处理模块。 引导处理模块(150)可用于对语料库应用 条目以识别和提取附加的术语,定义>条目。
-