TERM SYNONYM ACQUISITION METHOD AND TERM SYNONYM ACQUISITION APPARATUS
    1.
    发明申请
    TERM SYNONYM ACQUISITION METHOD AND TERM SYNONYM ACQUISITION APPARATUS 审中-公开
    期限同步收购方法和期限同步收购设备

    公开(公告)号:US20150006157A1

    公开(公告)日:2015-01-01

    申请号:US14376517

    申请日:2012-03-14

    IPC分类号: G06F17/27

    摘要: A term synonym acquisition apparatus includes: a first generating unit which generates a context vector of an input term in an original language and a context vector of each synonym candidate in the original language; a second generating unit which generates a context vector of an auxiliary term in an auxiliary language that is different from the original language, where the auxiliary term specifies a sense of the input term; a combining unit which generates a combined context vector based on the context vector of the input term and the context vector of the auxiliary term; and a ranking unit which compares the combined context vector with the context vector of each synonym candidate to generate ranked synonym candidates in the original language.

    摘要翻译: 术语同义词获取装置包括:第一生成单元,其以原始语言生成输入项的上下文向量和每个同义词候选的上下文向量; 第二生成单元,其以与原始语言不同的辅助语言生成辅助项的上下文向量,其中辅助项指定输入项的感觉; 组合单元,其基于所述输入项的上下文向量和所述辅助项的上下文向量来生成组合上下文向量; 以及排序单元,其将组合的上下文向量与每个同义词候选的上下文向量进行比较,以生成原始语言中的排名同义词候选。

    DEVICE, METHOD AND PROGRAM FOR ASSESSING SYNONYMOUS EXPRESSIONS
    3.
    发明申请
    DEVICE, METHOD AND PROGRAM FOR ASSESSING SYNONYMOUS EXPRESSIONS 有权
    用于评估同步表达的装置,方法和程序

    公开(公告)号:US20140343922A1

    公开(公告)日:2014-11-20

    申请号:US14117297

    申请日:2012-05-09

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2795

    摘要: A synonymous expression assessment device includes: synonymy assessment means for receiving input of binary relations each of which includes a nominal and a predicate, and assessing whether or not the input binary relations are synonymous using a similarity between input nominals and a similarity between input predicates; and inter-predicate similarity computation means for, when computing the similarity between the input predicates based on a distribution of occurrence frequencies of nominals that are in binary relations to the input predicate in a document set, performing the computation using a distribution of only nominals that are used in the same type of concept as the input nominal.

    摘要翻译: 同义表达式评估装置包括:同义评估装置,用于接收二进制关系的输入,每一个包括名义和谓词,并且使用输入名义之间的相似性和输入谓词之间的相似性来评估输入的二进制关系是否是同义的; 以及谓词间相似度计算装置,用于当基于与文档集中的输入谓词处于二进制关系的特征的出现频率的分布来计算输入谓词之间的相似度时,使用只有名词的分布来执行计算, 与输入名义相同的概念使用。

    Text mining system for analysis target data, a text mining method for analysis target data and a recording medium for recording analysis target data
    4.
    发明授权
    Text mining system for analysis target data, a text mining method for analysis target data and a recording medium for recording analysis target data 有权
    用于分析目标数据的文本挖掘系统,用于分析目标数据的文本挖掘方法和用于记录分析目标数据的记录介质

    公开(公告)号:US08805853B2

    公开(公告)日:2014-08-12

    申请号:US13518573

    申请日:2010-12-15

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06Q10/10

    摘要: A text mining system including an analysis target search unit which judges whether a commonality in expressions among text data exists, an analysis viewpoint generation unit which generates an analysis viewpoint to extract an expression from the target data, a positive example set identification unit which identifies a positive example set including an expression matching the generated analysis viewpoint in the target data, a characteristic quantity calculation unit which calculates a characteristic quantity showing a degree of characterizing the positive example set of expressions in the target data, and a characteristic expression ranking unit which extracts expressions having the calculated characteristic quantity equal to or greater than a predetermined threshold as characteristic expressions and ranks the extracted characteristic expressions, and the target search unit extracts the analysis viewpoint among which a difference in ranks provided for the characteristic expressions is equal to or greater than a predetermined threshold.

    摘要翻译: 一种文本挖掘系统,包括分析对象搜索单元,其判断是否存在文本数据中的表达式中的共同性,分析视点生成单元,生成从目标数据中提取表达式的分析视点;正示例集识别单元, 包括与目标数据中生成的分析视点相匹配的表达式的正示例集合;特征量计算单元,计算表示目标数据中的表达式的正示例集合的特征度的特征量;以及特征表达式排序单元,其提取 具有等于​​或大于预定阈值的计算特征量的表达式作为特征表达式并对所提取的特征表达式进行排序,并且目标搜索单元提取其中为特征表达式提供的等级的差异为 等于或大于预定阈值。

    Textual entailment recognition apparatus, textual entailment recognition method, and computer-readable recording medium
    5.
    发明授权
    Textual entailment recognition apparatus, textual entailment recognition method, and computer-readable recording medium 有权
    文本承载识别装置,文本识别方法和计算机可读记录介质

    公开(公告)号:US08762132B2

    公开(公告)日:2014-06-24

    申请号:US13823546

    申请日:2012-10-04

    IPC分类号: G06F17/27

    摘要: A textual entailment recognition apparatus (2) includes a vector generation unit (21) that generates, for each of first and second texts, a vector for each predicate-argument structure by using a word other than a word indicating a type of argument of a predicate in the predicate-argument structure; a combination identification (22) unit that compares the vector generated for each predicate-argument structure for the first text and the vector generated for each predicate-argument structure for the second text, and identifies combinations of the predicate-argument structures of the first text and the predicate-argument structure of the second text based on a result of the comparison; and an entailment determination unit (23) that obtains a feature amount for each of the identified combinations, and determines whether the first text entails the second text based on the obtained feature amounts.

    摘要翻译: 文本包含识别装置(2)包括矢量生成单元(21),其针对每个谓词参数结构,通过使用除了指示一个 谓词在谓词参数结构中; 组合识别(22)单元,其将针对第一文本的每个谓词参数结构生成的矢量与为第二文本的每个谓词参数结构生成的向量进行比较,并且识别第一文本的谓词参数结构的组合 以及基于比较结果的第二文本的谓词参数结构; 以及确定所识别的组合中的每一个的特征量的确定单元(23),并且基于获得的特征量来确定第一文本是否包含第二文本。

    Text mining apparatus, text mining method, and computer-readable recording medium
    6.
    发明授权
    Text mining apparatus, text mining method, and computer-readable recording medium 有权
    文本挖掘装置,文本挖掘方法和计算机可读记录介质

    公开(公告)号:US08380741B2

    公开(公告)日:2013-02-19

    申请号:US13060587

    申请日:2009-08-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/277 G10L15/26

    摘要: A text mining apparatus, a text mining method, and a program are provided that enable the influence that computer processing errors have on mining results to be reduced during text mining performed on a plurality of text data pieces including a text data piece generated by computer processing. A text mining apparatus 1 to be used includes an inherent portion extraction unit 6 that, for each of a plurality of text data pieces including a text data piece generated by computer processing, extracts an inherent portion of the text data piece relative to another of the text data pieces, an inherent confidence setting unit 7 that, for each inherent portion of each of the text data pieces, sets inherent confidence indicating confidence of the inherent portion, using the confidence that has been set for each of the text data pieces, and a mining processing unit 8 that performs text mining on each inherent portion of each of the text data pieces, using the inherent confidence.

    摘要翻译: 提供了一种文本挖掘装置,文本挖掘方法和程序,其能够在对包括通过计算机处理产生的文本数据片段的多个文本数据片段执行的文本挖掘期间减少计算机处理错误对挖掘结果的影响 。 要使用的文本挖掘装置1包括:固有部分提取单元6,对于包括通过计算机处理产生的文本数据片的多个文本数据片段中的每一个,提取文本数据片段中的另一个的固有部分 文本数据片段,固有置信度设置单元7,对于每个文本数据片段的每个固有部分,使用为每个文本数据片段设置的置信度来设置表示固有部分的置信度的固有置信度,以及 采用处理单元8,其使用固有置信度对每个文本数据的每个固有部分执行文本挖掘。

    TEXT PROCESSING APPARATUS, TEXT PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
    7.
    发明申请
    TEXT PROCESSING APPARATUS, TEXT PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM 有权
    文字处理设备,文本处理方法和计算机可读记录介质

    公开(公告)号:US20110282653A1

    公开(公告)日:2011-11-17

    申请号:US13142302

    申请日:2009-12-21

    IPC分类号: G06F17/27

    CPC分类号: G06F17/2827 G06F17/2775

    摘要: A text processing apparatus is provided with a segment determination unit 36 and a descriptive content determination unit 33. The segment determination unit 36 determines, with respect to a homogeneous segment that is similar to segments constituting a first text which is set as an analysis target (analysis target text) and that is included in another first text, whether the content thereof is included in a second text. The descriptive content determination unit 33 determines whether each segment constituting the analysis target text should be described in a corresponding second text, based on the determination result.

    摘要翻译: 文本处理装置具有段确定单元36和描述内容确定单元33.段确定单元36关于类似于构成作为分析目标的第一文本的段的均匀段确定( 分析目标文本),并且其被包括在另一第一文本中,其内容是否包括在第二文本中。 描述内容确定单元33基于确定结果来确定构成分析目标文本的每个片段是否应当以对应的第二文本进行描述。

    TEXT MINING APPARATUS, TEXT MINING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
    8.
    发明申请
    TEXT MINING APPARATUS, TEXT MINING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM 有权
    文本采矿设备,文本挖掘方法和计算机可读记录介质

    公开(公告)号:US20110161367A1

    公开(公告)日:2011-06-30

    申请号:US13060587

    申请日:2009-08-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/277 G10L15/26

    摘要: A text mining apparatus, a text mining method, and a program are provided that enable the influence that computer processing errors have on mining results to be reduced during text mining performed on a plurality of text data pieces including a text data piece generated by computer processing. A text mining apparatus 1 to be used includes an inherent portion extraction unit 6 that, for each of a plurality of text data pieces including a text data piece generated by computer processing, extracts an inherent portion of the text data piece relative to another of the text data pieces, an inherent confidence setting unit 7 that, for each inherent portion of each of the text data pieces, sets inherent confidence indicating confidence of the inherent portion, using the confidence that has been set for each of the text data pieces, and a mining processing unit 8 that performs text mining on each inherent portion of each of the text data pieces, using the inherent confidence.

    摘要翻译: 提供了一种文本挖掘装置,文本挖掘方法和程序,其能够在对包括通过计算机处理产生的文本数据片段的多个文本数据片段执行的文本挖掘期间减少计算机处理错误对挖掘结果的影响 。 要使用的文本挖掘装置1包括:固有部分提取单元6,对于包括通过计算机处理产生的文本数据片的多个文本数据片段中的每一个,提取文本数据片段中的另一个的固有部分 文本数据片段,固有置信度设置单元7,对于每个文本数据片段的每个固有部分,使用为每个文本数据片段设置的置信度来设置表示固有部分的置信度的固有置信度,以及 采用处理单元8,其使用固有置信度对每个文本数据的每个固有部分执行文本挖掘。

    TEXT MINING DEVICE, TEXT MINING METHOD, TEXT MINING PROGRAM, AND RECORDING MEDIUM
    9.
    发明申请
    TEXT MINING DEVICE, TEXT MINING METHOD, TEXT MINING PROGRAM, AND RECORDING MEDIUM 有权
    文本采矿设备,文字挖掘方法,文字挖掘程序和记录介质

    公开(公告)号:US20110010373A1

    公开(公告)日:2011-01-13

    申请号:US12919463

    申请日:2009-03-06

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2211 G06F17/2795

    摘要: Provided is a text mining device that performs an analysis properly with respect to a difference between plural related document data. Equipped are an element extracting section 140 that extracts language elements from related two or more document data respectively; a differential processing section 150 that extracts a difference between the document data by comparing the elements between the document data which were extracted by the element extracting means 140; and a statistical processing section 170 that performs statistical processing on the difference extracted by the differential processing section 150. The differential processing section 150 has: element associating section 151 that associates respective elements which are in identical, similar, synonymous, or analogous relation by comparing the elements of the document data between the document data which were extracted by the element extracting section 140; and differential element extracting section 152 that extracts an element with no corresponding element of a pair in the association by the element association section 151.

    摘要翻译: 提供了一种文本挖掘装置,其针对多个相关文档数据之间的差异正确地执行分析。 装备有分别从相关的两个或多个文档数据中提取语言元素的元素提取部分140; 差分处理部分150,通过比较由元素提取装置140提取的文档数据之间的元素来提取文档数据之间的差异; 以及统计处理部分170,其对由差分处理部分150提取的差异进行统计处理。差分处理部分150具有:元素关联部分151,其通过比较将相同,相似,同义或相似关系的各个元素相关联 由元素提取部140提取的文档数据之间的文档数据的元素; 以及差分元素提取部分152,其通过元素关联部分151提取在关联中没有对的元素的元素。

    Text mining apparatus, text mining method, and computer-readable recording medium
    10.
    发明授权
    Text mining apparatus, text mining method, and computer-readable recording medium 有权
    文本挖掘装置,文本挖掘方法和计算机可读记录介质

    公开(公告)号:US08751531B2

    公开(公告)日:2014-06-10

    申请号:US13060608

    申请日:2009-08-28

    IPC分类号: G06F17/30 G06F17/27

    CPC分类号: G06F17/277 G10L15/26

    摘要: A text mining apparatus, a text mining method, and a program are provided that accurately discriminate inherent portions of each of a plurality of text data pieces including a text data piece generated by computer processing.A text mining apparatus 1 to be used performs text mining using, as targets, a plurality of text data pieces including a text data piece generated by computer processing. Confidence is set for each of the text data pieces. The text mining apparatus 1 includes an inherent portion extraction unit 6 that extracts an inherent portion of each text data piece relative to another of the text data pieces, using the confidence set for each of the text data pieces.

    摘要翻译: 提供文本挖掘装置,文本挖掘方法和程序,其准确地区分包括通过计算机处理生成的文本数据片的多个文本数据的每一个的固有部分。 要使用的文本挖掘装置1使用包括通过计算机处理产生的文本数据片的多个文本数据作为目标执行文本挖掘。 为每个文本数据设置置信度。 文本挖掘装置1包括固有部分提取单元6,其使用针对每个文本数据片段的置信度来提取相对于另一个文本数据片段的每个文本数据片段的固有部分。