INFORMATION EXTRACTION SYSTEM, INFORMATION EXTRACTION METHOD, INFORMATION EXTRACTION PROGRAM, AND INFORMATION SERVICE SYSTEM
    1.
    发明申请
    INFORMATION EXTRACTION SYSTEM, INFORMATION EXTRACTION METHOD, INFORMATION EXTRACTION PROGRAM, AND INFORMATION SERVICE SYSTEM 有权
    信息提取系统,信息提取方法,信息提取程序和信息服务系统

    公开(公告)号:US20110161144A1

    公开(公告)日:2011-06-30

    申请号:US12294143

    申请日:2007-03-23

    IPC分类号: G06Q30/00 G06F17/27

    摘要: According to the present invention, phrases of the same kind can be extracted from a plurality of documents having various formats. A storage device stores a plurality of documents that have various formats. A pattern candidate creating unit receives a list of input words that are selected as samples among phrases that are to be included in a dictionary. The pattern candidate creating unit selects one document, determines forward and backward character strings of input words in the selected document as candidates of patterns, and stores the forward and backward character strings as a pattern candidate. The pattern candidate creating unit executes the above processes for each of the documents. A phrase candidate creating unit extracts phrases interposed between patterns included in the pattern candidate as candidates of phrases to be output, and stores the extracted phrases as a phrase candidate. A phrase selecting unit outputs a candidate of a phrase satisfying a predetermined condition among candidates of phrases included in the phrase candidate as an output word to an output device.

    摘要翻译: 根据本发明,可以从具有各种格式的多个文档中提取相同类型的短语。 存储装置存储具有各种格式的多个文档。 模式候选者创建单元接收在要包括在字典中的短语中作为样本选择的输入单词的列表。 模式候选创建单元选择一个文档,确定所选文档中的输入字的前向和后向字符串作为模式的候选,并将前向和后向字符串存储为模式候选。 模式候补创建单元对每个文档执行上述处理。 短语候选创建单元提取插入在包括在模式候选中的模式之间的短语作为要输出的短语的候选,并将提取的短语存储为短语候选。 短语选择单元将包括在短语候选中的短语候选中满足预定条件的短语的候选作为输出字输出到输出装置。

    COOCCURRENCE DICTIONARY CREATING SYSTEM, SCORING SYSTEM, COOCCURRENCE DICTIONARY CREATING METHOD, SCORING METHOD, AND PROGRAM THEREOF
    2.
    发明申请
    COOCCURRENCE DICTIONARY CREATING SYSTEM, SCORING SYSTEM, COOCCURRENCE DICTIONARY CREATING METHOD, SCORING METHOD, AND PROGRAM THEREOF 有权
    协调词典创作系统,评分系统,协调词典创作方法,评分方法及其程序

    公开(公告)号:US20110055228A1

    公开(公告)日:2011-03-03

    申请号:US12922320

    申请日:2009-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A cooccurrence dictionary creating system includes: a language analyzing section which subjects a text to a morpheme analysis, a clause specification, and a modification relationship analysis between clauses, a cooccurrence relationship collecting section which collects cooccurrences of nouns in each clause of the text, modification relationships of nouns and declinable words, and modification relationships between declinable words as cooccurrence relationships, a cooccurrence score calculating section which calculates a cooccurrence score of the cooccurrence relationship based on a frequency of the collected cooccurrence relationship, and a cooccurrence dictionary storage section which stores a cooccurrence dictionary in which a correspondence between the calculated cooccurrence score and the cooccurrence relationship is described.

    摘要翻译: 并发词典创建系统包括:语言分析部分,其对文本进行语素分析,子句规范,以及条款之间的修改关系分析,在文本的每个子句中收集名词的一致性的共同关系收集部分,修改 名词和不可否认的词的关系,以及可下降词之间的修饰关系作为共同发生关系,基于收集的同现关系的频率来计算并发关系的同现比分的共同出发分数计算部分,以及存储 描述了计算出的并发分数与共同发生关系之间的对应关系的同时发生词典。

    POLARITY ESTIMATION SYSTEM, INFORMATION DELIVERY SYSTEM, POLARITY ESTIMATION METHOD, POLARITY ESTIMATION PROGRAM AND EVALUATION POLARITY ESTIMATIOM PROGRAM
    3.
    发明申请
    POLARITY ESTIMATION SYSTEM, INFORMATION DELIVERY SYSTEM, POLARITY ESTIMATION METHOD, POLARITY ESTIMATION PROGRAM AND EVALUATION POLARITY ESTIMATIOM PROGRAM 审中-公开
    极地估计系统,信息交付系统,极性估计方法,极地估计方案和评估极值估计方案

    公开(公告)号:US20100017391A1

    公开(公告)日:2010-01-21

    申请号:US12448010

    申请日:2007-11-20

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06Q10/10

    摘要: An evaluation polarity of reputation information with an unknown evaluation polarity is estimated by utilizing reputation information with a known evaluation polarity. The present polarity estimation system is a polarity estimation system for estimating an evaluation polarity indicating whether reputation information is positive or negative, and includes a reputation information storage part that precedently stores reputation information with a known evaluation polarity; and a polarity estimating means for estimating an evaluation polarity of reputation information with an unknown evaluation polarity on the basis of the reputation information with the known evaluation polarity precedently stored in the reputation information storage part.

    摘要翻译: 通过利用具有已知评价极性的信誉信息来估计具有未知评估极性的信誉信息的评估极性。 本极性估计系统是用于估计表示信誉信息是正还是负的评价极性的极性估计系统,并且包括以前存储具有已知评价极性的信誉信息的信誉信息存储部; 以及极性估计装置,用于基于先前存储在信誉信息存储部分中的具有已知评估极性的信誉信息来估计具有未知评估极性的信誉信息的评估极性。

    Word classification system, method, and program
    4.
    发明授权
    Word classification system, method, and program 有权
    词分类系统,方法和程序

    公开(公告)号:US08504356B2

    公开(公告)日:2013-08-06

    申请号:US12920920

    申请日:2009-04-02

    CPC分类号: G06F17/2735 G06F17/277

    摘要: A word classification system is provided with an inter-word pattern learning section for learning at least either the context information or the layout information between classification-known words which co-appear and creating an inter-word pattern for determining whether data relating to a word pair which is a combination of words is data relating to a same-classification word pair which is the combination of words in the same classification or data relating to a different-classification word pair which is a combination of words in different classifications on the basis of the relationship between the classification-known words which co-appear in a document.

    摘要翻译: 字分类系统提供有字间模式学习部分,用于至少学习上下文信息或共同出现的分类已知单词之间的布局信息,并创建用于确定与单词相关的数据的字间模式 作为词组合的对是与同一分类词对相关的数据,该相同分类词对是与不同分类词对相关的单词的组合或与不同分类词对相关的数据,该不同分类词对是基于不同分类中的单词的组合 在文档中共同出现的分类已知单词之间的关系。

    Information extraction system, information extraction method, information extraction program, and information service system
    5.
    发明授权
    Information extraction system, information extraction method, information extraction program, and information service system 有权
    信息提取系统,信息提取方法,信息提取程序和信息服务系统

    公开(公告)号:US08886661B2

    公开(公告)日:2014-11-11

    申请号:US12294143

    申请日:2007-03-23

    摘要: According to the present invention, phrases of the same kind can be extracted from a plurality of documents having various formats. A storage device stores a plurality of documents that have various formats. A pattern candidate creating unit receives a list of input words that are selected as samples among phrases that are to be included in a dictionary. The pattern candidate creating unit selects one document, determines forward and backward character strings of input words in the selected document as candidates of patterns, and stores the forward and backward character strings as a pattern candidate. The pattern candidate creating unit executes the above processes for each of the documents. A phrase candidate creating unit extracts phrases interposed between patterns included in the pattern candidate as candidates of phrases to be output, and stores the extracted phrases as a phrase candidate. A phrase selecting unit outputs a candidate of a phrase satisfying a predetermined condition among candidates of phrases included in the phrase candidate as an output word to an output device.

    摘要翻译: 根据本发明,可以从具有各种格式的多个文档中提取相同类型的短语。 存储装置存储具有各种格式的多个文档。 模式候选者创建单元接收在要包括在字典中的短语中作为样本选择的输入单词的列表。 模式候选创建单元选择一个文档,确定所选文档中的输入字的前向和后向字符串作为模式的候选,并将前向和后向字符串存储为模式候选。 模式候补创建单元对每个文档执行上述处理。 短语候选创建单元提取插入在包括在模式候选中的模式之间的短语作为要输出的短语的候选,并将提取的短语存储为短语候选。 短语选择单元将包括在短语候选中的短语候选中满足预定条件的短语的候选作为输出字输出到输出装置。

    Attribute extraction method, system, and program
    6.
    发明授权
    Attribute extraction method, system, and program 有权
    属性提取方法,系统和程序

    公开(公告)号:US08463738B2

    公开(公告)日:2013-06-11

    申请号:US12866215

    申请日:2009-03-05

    IPC分类号: G06F17/30

    摘要: Sets of strings of which the drawing positions are arranged in one direction are extracted from a document as attribute groups. An attribute name score is calculated for each attribute group to determine an extent to which each attribute group is a set of attribute names. Based on the attribute name scores, an attribute name group is selected out of the attribute groups. From among the attribute groups, an attribute group which includes a string which is the same as at least one string of the attribute name group and of which the drawing position is the same as that of the string of the attribute name group is selected. From the string at the same drawing position, an attribute name is extracted. From the other strings of the selected attribute group than those at the same drawing position, an attribute value corresponding to the attribute name is extracted.

    摘要翻译: 绘图位置在一个方向排列的一组字符串作为属性组从文档中提取出来。 为每个属性组计算属性名称得分,以确定每个属性组是一组属性名称的范围。 根据属性名称分数,从属性组中选出属性名称组。 在属性组中,选择包括与属性名称组的至少一个字符串相同的字符串并且其绘制位置与属性名称组的字符串相同的字符串的属性组。 从相同绘图位置的字符串中提取属性名称。 从所选择的属性组的其他字符串中,与相同的绘图位置相对应的属性值被提取。

    Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof
    7.
    发明授权
    Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof 有权
    并发词典创建系统,评分系统,同时发生字典创建方法,评分方法及其程序

    公开(公告)号:US08443008B2

    公开(公告)日:2013-05-14

    申请号:US12922320

    申请日:2009-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A cooccurrence dictionary creating system includes: a language analyzing section which subjects a text to a morpheme analysis, a clause specification, and a modification relationship analysis between clauses, a cooccurrence relationship collecting section which collects cooccurrences of nouns in each clause of the text, modification relationships of nouns and declinable words, and modification relationships between declinable words as cooccurrence relationships, a cooccurrence score calculating section which calculates a cooccurrence score of the cooccurrence relationship based on a frequency of the collected cooccurrence relationship, and a cooccurrence dictionary storage section which stores a cooccurrence dictionary in which a correspondence between the calculated cooccurrence score and the cooccurrence relationship is described.

    摘要翻译: 并发词典创建系统包括:语言分析部分,其对文本进行语素分析,子句规范,以及条款之间的修改关系分析,在文本的每个子句中收集名词的一致性的共同关系收集部分,修改 名词和不可否认的词的关系,以及可下降词之间的修饰关系作为共同发生关系,基于收集的同现关系的频率来计算并发关系的同现比分的共同出发分数计算部分,以及存储 描述了计算出的并发分数与共同发生关系之间的对应关系的同时发生词典。

    TRAINING DATA GENERATION APPARATUS, CHARACTERISTIC EXPRESSION EXTRACTION SYSTEM, TRAINING DATA GENERATION METHOD, AND COMPUTER-READABLE STORAGE MEDIUM
    8.
    发明申请
    TRAINING DATA GENERATION APPARATUS, CHARACTERISTIC EXPRESSION EXTRACTION SYSTEM, TRAINING DATA GENERATION METHOD, AND COMPUTER-READABLE STORAGE MEDIUM 有权
    培训数据生成装置,特征表达提取系统,培训数据生成方法和计算机可读存储介质

    公开(公告)号:US20120030157A1

    公开(公告)日:2012-02-02

    申请号:US13263280

    申请日:2010-03-17

    IPC分类号: G06F15/18

    摘要: The disclosed apparatus uses a training data generation apparatus 2, which generates training data used for creating characteristic expression extraction rules. The training data generation apparatus 2 includes: a training data candidate clustering unit 21, which clusters a plurality of training data candidates assigned labels indicating annotation classes based on feature values containing respective context information, and a training data generation unit 22 which, by referring to each cluster obtained using the clustering results, obtains the distribution of the labels of the training data candidates within the cluster, identifies training data candidates that meet a preset condition based on the obtained distribution, and generates training data using the identified training data candidates.

    摘要翻译: 所公开的装置使用训练数据生成装置2,其生成用于创建特征表达式提取规则的训练数据。 训练数据产生装置2包括:训练数据候选聚类单元21,其基于包含各个上下文信息的特征值聚集分配了指示注释类别的标签的多个训练数据候选者;训练数据生成单元22,通过参考 使用聚类结果获得的每个聚类获得聚类内的训练数据候选的标签的分布,基于获得的分布来识别满足预设条件的训练数据候选,并使用所识别的训练数据候选来生成训练数据。

    ATTRIBUTE EXTRACTION METHOD, SYSTEM, AND PROGRAM
    9.
    发明申请
    ATTRIBUTE EXTRACTION METHOD, SYSTEM, AND PROGRAM 有权
    属性提取方法,系统和程序

    公开(公告)号:US20100318525A1

    公开(公告)日:2010-12-16

    申请号:US12866215

    申请日:2009-03-05

    IPC分类号: G06F17/30

    摘要: Sets of strings of which the drawing positions are arranged in one direction are extracted from a document as attribute groups. An attribute name score is calculated for each attribute group to determine an extent to which each attribute group is a set of attribute names. Based on the attribute name scores, an attribute name group is selected out of the attribute groups. From among the attribute groups, an attribute group which includes a string which is the same as at least one string of the attribute name group and of which the drawing position is the same as that of the string of the attribute name group is selected. From the string at the same drawing position, an attribute name is extracted. From the other strings of the selected attribute group than those at the same drawing position, an attribute value corresponding to the attribute name is extracted.

    摘要翻译: 绘图位置在一个方向排列的一组字符串作为属性组从文档中提取出来。 为每个属性组计算属性名称得分,以确定每个属性组是一组属性名称的范围。 根据属性名称分数,从属性组中选出属性名称组。 在属性组中,选择包括与属性名称组的至少一个字符串相同的字符串并且其绘制位置与属性名称组的字符串相同的字符串的属性组。 从相同绘图位置的字符串中提取属性名称。 从所选择的属性组的其他字符串中,与相同的绘图位置相对应的属性值被提取。

    Training data generation apparatus, characteristic expression extraction system, training data generation method, and computer-readable storage medium
    10.
    发明授权
    Training data generation apparatus, characteristic expression extraction system, training data generation method, and computer-readable storage medium 有权
    培训数据生成装置,特征表达提取系统,训练数据生成方法以及计算机可读存储介质

    公开(公告)号:US09195646B2

    公开(公告)日:2015-11-24

    申请号:US13263280

    申请日:2010-03-17

    IPC分类号: G06F17/27 G06N3/08 G06N99/00

    摘要: The disclosed apparatus uses a training data generation apparatus 2, which generates training data used for creating characteristic expression extraction rules. The training data generation apparatus 2 includes: a training data candidate clustering unit 21, which clusters a plurality of training data candidates assigned labels indicating annotation classes based on feature values containing respective context information, and a training data generation unit 22 which, by referring to each cluster obtained using the clustering results, obtains the distribution of the labels of the training data candidates within the cluster, identifies training data candidates that meet a preset condition based on the obtained distribution, and generates training data using the identified training data candidates.

    摘要翻译: 所公开的装置使用训练数据生成装置2,其生成用于创建特征表达式提取规则的训练数据。 训练数据产生装置2包括:训练数据候选聚类单元21,其基于包含各个上下文信息的特征值聚集分配了指示注释类别的标签的多个训练数据候选者;训练数据生成单元22,通过参考 使用聚类结果获得的每个聚类获得聚类内的训练数据候选的标签的分布,基于获得的分布来识别满足预设条件的训练数据候选,并使用所识别的训练数据候选来生成训练数据。