INFORMATION ANALYSIS APPARATUS, INFORMATION ANALYSIS METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
    1.
    发明申请
    INFORMATION ANALYSIS APPARATUS, INFORMATION ANALYSIS METHOD, AND COMPUTER-READABLE RECORDING MEDIUM 有权
    信息分析装置,信息分析方法和计算机可读记录介质

    公开(公告)号:US20110167027A1

    公开(公告)日:2011-07-07

    申请号:US13063231

    申请日:2009-10-06

    IPC分类号: G06N5/02 G06F15/18

    摘要: An information analysis apparatus, an information analysis method, and a program are provided that enable target information to be determined in units of single sentences, rather than in units of plural sentences, while taking into consideration the tendency of appearance of the target information. An information analysis apparatus 5 is used to perform an analysis on text information to determine whether or not the text information corresponds to the target information. The information analysis apparatus 5 includes a density estimation unit 51 that estimates, in units of analysis each composed of a plurality of sentences of text information, a density indicating the degree to which the target information is included in the unit of analysis, and a determination unit 52 that obtains an evaluation value indicating the degree to which each sentence included in each unit of analysis corresponds to the target information from the estimated density of the unit of analysis, and determines whether or not the sentence corresponds to the target information based on the evaluation value.

    摘要翻译: 提供信息分析装置,信息分析方法和程序,其使得能够以单个句子为单位来确定目标信息,而不是以多个句子为单位,同时考虑目标信息的出现趋势。 信息分析装置5用于对文本信息进行分析,以确定文本信息是否对应于目标信息。 信息分析装置5包括密度估计单元51,以分析为单位,以文本信息的多个句子为单位,以表示分析单位包含目标信息的程度的浓度, 单元52,其从评估单位的估计密度获得表示包含在每个分析单元中的每个句子的程度对应于目标信息的程度的评估值,并且基于该分析值判断该句子是否对应于目标信息 评价值。

    Determining whether text information corresponds to target information
    2.
    发明授权
    Determining whether text information corresponds to target information 有权
    确定文本信息是否对应于目标信息

    公开(公告)号:US08510249B2

    公开(公告)日:2013-08-13

    申请号:US13063231

    申请日:2009-10-06

    IPC分类号: G06F17/27

    摘要: An information analysis apparatus that performs an analysis on text information to determine whether or not the text information corresponds to the target information. The information analysis apparatus includes a storage device that stores the text information; a density estimation unit that estimates, in units of analysis each composed of a plurality of sentences of text information, a density indicating the degree to which the target information is included in the unit of analysis; and a determination unit that obtains an evaluation value indicating the degree to which each sentence included in each unit of analysis corresponds to the target information from the estimated density of the unit of analysis, and determines whether or not the sentence corresponds to the target information based on the evaluation value.

    摘要翻译: 一种对文本信息执行分析以确定文本信息是否对应于目标信息的信息分析装置。 信息分析装置包括存储文本信息的存储装置; 密度估计单元,以分析为单位,以文本信息的多个句子为单位,以表示分析单位包含目标信息的程度的浓度进行估计; 以及确定单元,其从分析单元的估计密度获得指示每个分析单元中包括的每个句子的程度对应于目标信息的评估值,并且确定该句子是否对应于目标信息 对评价值。

    INFORMATION EXTRACTION SYSTEM, INFORMATION EXTRACTION METHOD, INFORMATION EXTRACTION PROGRAM, AND INFORMATION SERVICE SYSTEM
    3.
    发明申请
    INFORMATION EXTRACTION SYSTEM, INFORMATION EXTRACTION METHOD, INFORMATION EXTRACTION PROGRAM, AND INFORMATION SERVICE SYSTEM 有权
    信息提取系统,信息提取方法,信息提取程序和信息服务系统

    公开(公告)号:US20110161144A1

    公开(公告)日:2011-06-30

    申请号:US12294143

    申请日:2007-03-23

    IPC分类号: G06Q30/00 G06F17/27

    摘要: According to the present invention, phrases of the same kind can be extracted from a plurality of documents having various formats. A storage device stores a plurality of documents that have various formats. A pattern candidate creating unit receives a list of input words that are selected as samples among phrases that are to be included in a dictionary. The pattern candidate creating unit selects one document, determines forward and backward character strings of input words in the selected document as candidates of patterns, and stores the forward and backward character strings as a pattern candidate. The pattern candidate creating unit executes the above processes for each of the documents. A phrase candidate creating unit extracts phrases interposed between patterns included in the pattern candidate as candidates of phrases to be output, and stores the extracted phrases as a phrase candidate. A phrase selecting unit outputs a candidate of a phrase satisfying a predetermined condition among candidates of phrases included in the phrase candidate as an output word to an output device.

    摘要翻译: 根据本发明,可以从具有各种格式的多个文档中提取相同类型的短语。 存储装置存储具有各种格式的多个文档。 模式候选者创建单元接收在要包括在字典中的短语中作为样本选择的输入单词的列表。 模式候选创建单元选择一个文档,确定所选文档中的输入字的前向和后向字符串作为模式的候选,并将前向和后向字符串存储为模式候选。 模式候补创建单元对每个文档执行上述处理。 短语候选创建单元提取插入在包括在模式候选中的模式之间的短语作为要输出的短语的候选,并将提取的短语存储为短语候选。 短语选择单元将包括在短语候选中的短语候选中满足预定条件的短语的候选作为输出字输出到输出装置。

    COOCCURRENCE DICTIONARY CREATING SYSTEM, SCORING SYSTEM, COOCCURRENCE DICTIONARY CREATING METHOD, SCORING METHOD, AND PROGRAM THEREOF
    4.
    发明申请
    COOCCURRENCE DICTIONARY CREATING SYSTEM, SCORING SYSTEM, COOCCURRENCE DICTIONARY CREATING METHOD, SCORING METHOD, AND PROGRAM THEREOF 有权
    协调词典创作系统,评分系统,协调词典创作方法,评分方法及其程序

    公开(公告)号:US20110055228A1

    公开(公告)日:2011-03-03

    申请号:US12922320

    申请日:2009-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A cooccurrence dictionary creating system includes: a language analyzing section which subjects a text to a morpheme analysis, a clause specification, and a modification relationship analysis between clauses, a cooccurrence relationship collecting section which collects cooccurrences of nouns in each clause of the text, modification relationships of nouns and declinable words, and modification relationships between declinable words as cooccurrence relationships, a cooccurrence score calculating section which calculates a cooccurrence score of the cooccurrence relationship based on a frequency of the collected cooccurrence relationship, and a cooccurrence dictionary storage section which stores a cooccurrence dictionary in which a correspondence between the calculated cooccurrence score and the cooccurrence relationship is described.

    摘要翻译: 并发词典创建系统包括:语言分析部分,其对文本进行语素分析,子句规范,以及条款之间的修改关系分析,在文本的每个子句中收集名词的一致性的共同关系收集部分,修改 名词和不可否认的词的关系,以及可下降词之间的修饰关系作为共同发生关系,基于收集的同现关系的频率来计算并发关系的同现比分的共同出发分数计算部分,以及存储 描述了计算出的并发分数与共同发生关系之间的对应关系的同时发生词典。

    POLARITY ESTIMATION SYSTEM, INFORMATION DELIVERY SYSTEM, POLARITY ESTIMATION METHOD, POLARITY ESTIMATION PROGRAM AND EVALUATION POLARITY ESTIMATIOM PROGRAM
    5.
    发明申请
    POLARITY ESTIMATION SYSTEM, INFORMATION DELIVERY SYSTEM, POLARITY ESTIMATION METHOD, POLARITY ESTIMATION PROGRAM AND EVALUATION POLARITY ESTIMATIOM PROGRAM 审中-公开
    极地估计系统,信息交付系统,极性估计方法,极地估计方案和评估极值估计方案

    公开(公告)号:US20100017391A1

    公开(公告)日:2010-01-21

    申请号:US12448010

    申请日:2007-11-20

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06Q10/10

    摘要: An evaluation polarity of reputation information with an unknown evaluation polarity is estimated by utilizing reputation information with a known evaluation polarity. The present polarity estimation system is a polarity estimation system for estimating an evaluation polarity indicating whether reputation information is positive or negative, and includes a reputation information storage part that precedently stores reputation information with a known evaluation polarity; and a polarity estimating means for estimating an evaluation polarity of reputation information with an unknown evaluation polarity on the basis of the reputation information with the known evaluation polarity precedently stored in the reputation information storage part.

    摘要翻译: 通过利用具有已知评价极性的信誉信息来估计具有未知评估极性的信誉信息的评估极性。 本极性估计系统是用于估计表示信誉信息是正还是负的评价极性的极性估计系统,并且包括以前存储具有已知评价极性的信誉信息的信誉信息存储部; 以及极性估计装置,用于基于先前存储在信誉信息存储部分中的具有已知评估极性的信誉信息来估计具有未知评估极性的信誉信息的评估极性。

    Information extraction system, information extraction method, information extraction program, and information service system
    6.
    发明授权
    Information extraction system, information extraction method, information extraction program, and information service system 有权
    信息提取系统,信息提取方法,信息提取程序和信息服务系统

    公开(公告)号:US08886661B2

    公开(公告)日:2014-11-11

    申请号:US12294143

    申请日:2007-03-23

    摘要: According to the present invention, phrases of the same kind can be extracted from a plurality of documents having various formats. A storage device stores a plurality of documents that have various formats. A pattern candidate creating unit receives a list of input words that are selected as samples among phrases that are to be included in a dictionary. The pattern candidate creating unit selects one document, determines forward and backward character strings of input words in the selected document as candidates of patterns, and stores the forward and backward character strings as a pattern candidate. The pattern candidate creating unit executes the above processes for each of the documents. A phrase candidate creating unit extracts phrases interposed between patterns included in the pattern candidate as candidates of phrases to be output, and stores the extracted phrases as a phrase candidate. A phrase selecting unit outputs a candidate of a phrase satisfying a predetermined condition among candidates of phrases included in the phrase candidate as an output word to an output device.

    摘要翻译: 根据本发明,可以从具有各种格式的多个文档中提取相同类型的短语。 存储装置存储具有各种格式的多个文档。 模式候选者创建单元接收在要包括在字典中的短语中作为样本选择的输入单词的列表。 模式候选创建单元选择一个文档,确定所选文档中的输入字的前向和后向字符串作为模式的候选,并将前向和后向字符串存储为模式候选。 模式候补创建单元对每个文档执行上述处理。 短语候选创建单元提取插入在包括在模式候选中的模式之间的短语作为要输出的短语的候选,并将提取的短语存储为短语候选。 短语选择单元将包括在短语候选中的短语候选中满足预定条件的短语的候选作为输出字输出到输出装置。

    Word classification system, method, and program
    7.
    发明授权
    Word classification system, method, and program 有权
    词分类系统,方法和程序

    公开(公告)号:US08504356B2

    公开(公告)日:2013-08-06

    申请号:US12920920

    申请日:2009-04-02

    CPC分类号: G06F17/2735 G06F17/277

    摘要: A word classification system is provided with an inter-word pattern learning section for learning at least either the context information or the layout information between classification-known words which co-appear and creating an inter-word pattern for determining whether data relating to a word pair which is a combination of words is data relating to a same-classification word pair which is the combination of words in the same classification or data relating to a different-classification word pair which is a combination of words in different classifications on the basis of the relationship between the classification-known words which co-appear in a document.

    摘要翻译: 字分类系统提供有字间模式学习部分,用于至少学习上下文信息或共同出现的分类已知单词之间的布局信息,并创建用于确定与单词相关的数据的字间模式 作为词组合的对是与同一分类词对相关的数据,该相同分类词对是与不同分类词对相关的单词的组合或与不同分类词对相关的数据,该不同分类词对是基于不同分类中的单词的组合 在文档中共同出现的分类已知单词之间的关系。

    Attribute extraction method, system, and program
    8.
    发明授权
    Attribute extraction method, system, and program 有权
    属性提取方法,系统和程序

    公开(公告)号:US08463738B2

    公开(公告)日:2013-06-11

    申请号:US12866215

    申请日:2009-03-05

    IPC分类号: G06F17/30

    摘要: Sets of strings of which the drawing positions are arranged in one direction are extracted from a document as attribute groups. An attribute name score is calculated for each attribute group to determine an extent to which each attribute group is a set of attribute names. Based on the attribute name scores, an attribute name group is selected out of the attribute groups. From among the attribute groups, an attribute group which includes a string which is the same as at least one string of the attribute name group and of which the drawing position is the same as that of the string of the attribute name group is selected. From the string at the same drawing position, an attribute name is extracted. From the other strings of the selected attribute group than those at the same drawing position, an attribute value corresponding to the attribute name is extracted.

    摘要翻译: 绘图位置在一个方向排列的一组字符串作为属性组从文档中提取出来。 为每个属性组计算属性名称得分,以确定每个属性组是一组属性名称的范围。 根据属性名称分数,从属性组中选出属性名称组。 在属性组中,选择包括与属性名称组的至少一个字符串相同的字符串并且其绘制位置与属性名称组的字符串相同的字符串的属性组。 从相同绘图位置的字符串中提取属性名称。 从所选择的属性组的其他字符串中,与相同的绘图位置相对应的属性值被提取。

    Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof
    9.
    发明授权
    Cooccurrence dictionary creating system, scoring system, cooccurrence dictionary creating method, scoring method, and program thereof 有权
    并发词典创建系统,评分系统,同时发生字典创建方法,评分方法及其程序

    公开(公告)号:US08443008B2

    公开(公告)日:2013-05-14

    申请号:US12922320

    申请日:2009-04-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2735

    摘要: A cooccurrence dictionary creating system includes: a language analyzing section which subjects a text to a morpheme analysis, a clause specification, and a modification relationship analysis between clauses, a cooccurrence relationship collecting section which collects cooccurrences of nouns in each clause of the text, modification relationships of nouns and declinable words, and modification relationships between declinable words as cooccurrence relationships, a cooccurrence score calculating section which calculates a cooccurrence score of the cooccurrence relationship based on a frequency of the collected cooccurrence relationship, and a cooccurrence dictionary storage section which stores a cooccurrence dictionary in which a correspondence between the calculated cooccurrence score and the cooccurrence relationship is described.

    摘要翻译: 并发词典创建系统包括:语言分析部分,其对文本进行语素分析,子句规范,以及条款之间的修改关系分析,在文本的每个子句中收集名词的一致性的共同关系收集部分,修改 名词和不可否认的词的关系,以及可下降词之间的修饰关系作为共同发生关系,基于收集的同现关系的频率来计算并发关系的同现比分的共同出发分数计算部分,以及存储 描述了计算出的并发分数与共同发生关系之间的对应关系的同时发生词典。

    TRAINING DATA GENERATION APPARATUS, CHARACTERISTIC EXPRESSION EXTRACTION SYSTEM, TRAINING DATA GENERATION METHOD, AND COMPUTER-READABLE STORAGE MEDIUM
    10.
    发明申请
    TRAINING DATA GENERATION APPARATUS, CHARACTERISTIC EXPRESSION EXTRACTION SYSTEM, TRAINING DATA GENERATION METHOD, AND COMPUTER-READABLE STORAGE MEDIUM 有权
    培训数据生成装置,特征表达提取系统,培训数据生成方法和计算机可读存储介质

    公开(公告)号:US20120030157A1

    公开(公告)日:2012-02-02

    申请号:US13263280

    申请日:2010-03-17

    IPC分类号: G06F15/18

    摘要: The disclosed apparatus uses a training data generation apparatus 2, which generates training data used for creating characteristic expression extraction rules. The training data generation apparatus 2 includes: a training data candidate clustering unit 21, which clusters a plurality of training data candidates assigned labels indicating annotation classes based on feature values containing respective context information, and a training data generation unit 22 which, by referring to each cluster obtained using the clustering results, obtains the distribution of the labels of the training data candidates within the cluster, identifies training data candidates that meet a preset condition based on the obtained distribution, and generates training data using the identified training data candidates.

    摘要翻译: 所公开的装置使用训练数据生成装置2,其生成用于创建特征表达式提取规则的训练数据。 训练数据产生装置2包括:训练数据候选聚类单元21,其基于包含各个上下文信息的特征值聚集分配了指示注释类别的标签的多个训练数据候选者;训练数据生成单元22,通过参考 使用聚类结果获得的每个聚类获得聚类内的训练数据候选的标签的分布,基于获得的分布来识别满足预设条件的训练数据候选,并使用所识别的训练数据候选来生成训练数据。